A blog which discusses various GPU applications including visualization, GPGPU and games.

Monday, December 29, 2008

Draw Indirect

Draw indirect is a cool new feature in Direct3D 11 which is basically a more general version of DrawAuto, for compute shaders.

When using stream out with a geometry shader, it is common to want to take the results of one stream out pass and bind it as input to another pass. However, the geometry shader can emit a variable number of primitives; because of this, the programmer must read the number of primitives rendered back from the GPU, causing a stall. To mitigate this problem, DrawAuto was introduced in Direct3D 10 and recently in OpenGL. DrawAuto keeps everything on the GPU by automatically binding the stream out buffers to an input slot and issuing the draw call with the appropriate number of primitives filled in.

Draw indirect takes this one step further by allowing a buffer to be used as the arguments to a draw call. For example, consider the following compute shader.

RWBuffer<uint> args : register(u0);

void CS(uint3 id : SV_DispatchThreadID)
/* Perform some computation here */

if (id.x == 0 && id.y == 0 && id.z == 0)
args[0] = 1000;
args[1] = 1;
args[2] = 0;
args[3] = 0;

This compute shader writes out 4 unsigned integers to a buffer from thread 0. These values represent the arguments for our draw call.

And on the CPU side of things:

g_pd3dDevice->DrawInstancedIndirect(argsbuf, 0);

The result is the same as if DrawInstanced(1000, 1, 0, 0) had been called. How cool is that?

There are a few draw indirect calls: DrawInstancedIndirect, DrawIndexedInstancedIndirect and DispatchIndirect.

Wednesday, December 24, 2008


General purpose GPU (GPGPU) involves using a graphics card for general purpose computation. It used to be that one would have to perform computations in the pixel shader and render geometry to execute it, which isn't the best abstraction. Recently, there have been developments which allow for proper abstraction: OpenCL, Direct3D 11 Compute, CUDA, etc.

In this example, I have chosen to focus on Direct3D 11 Compute; many of the same ideas apply to the other languages as well. Direct3D 11 Compute is a new type of shader in D3D11 which allows for the explicit usage of shared memory, scattered writes, etc.

As a simple example, say we want to use the compute shader to produce a procedural texture.

RWTexture2D<float4> texrw : register(u0);

void CS(uint3 id : SV_DispatchThreadID)
float4 color;

color.r = id.x / 255.0f;
color.g = (id.x + id.y) / 510.0f;
color.b = (sin(id.x * id.y) + 1.0f) / 2.0f;
color.a = 1.0f;

texrw[id.xy] = color;

Let's take a look at the application-side code.

ID3D11UnorderedAccessView *nullcsview[] = { NULL };
g_pd3dDC->CSSetUnorderedAccessViews(0, 1, &texrwv, NULL);
g_pd3dDC->CSSetShader(cs, NULL, 0);
g_pd3dDC->Dispatch(256, 256, 1);
g_pd3dDC->CSSetUnorderedAccessViews(0, 1, nullcsview, NULL);

There isn't much going on here; I didn't even need to use a compute shader for this. However, it does illustrate one important concept: scattered writes. Notice how I am writing a value to an explicit texel in the texrw texture.

Running this compute shader kernel with 256x256x1 groups and using a simple pixel shader to texture a quad, the results are as follows:

New blog

I've decided to start my own blog for my GPU experiments. Expect to see new posts soon!