The problem with using the pixel shader to perform convolutions is the redundant texture fetching. Imagine the convolution window being slid to the right by one pixel: each time, there is a large overlap in texture fetches. Ideally, we should be able to fetch the information from the texture once, and store it into a cache. This is where compute shaders come in.
Compute shaders allow access to "groupshared" memory: in other words, memory that is shared amongst all of the threads in a group. Essentially what we can do is fill up a group's shared memory with a chunk of the texture, synchronize the threads, and then continue with the convolution. Only this time, we reference the shared memory instead of the texture.
In a future post, I will provide a more complete example. But for now, I will outline the two methods:
Method A: Pixel shader
Texture2D<float> img;
float result = 0.0f;
int w2 = (w - 1) / 2;
int h2 = (h - 1) / 2;
for (int j = -h2; j <= h2; j++)
{
    for (int i = -w2; i <= w2; i++)
    {
        result += img[int2(x + i, y + j)] * kernel[w * (j + h2) + (i + w2)];
    }
}
return result;
Above, x and y represent the position of the pixel being processed, while w and h are the width and height of the convolution kernel.
Method B: Compute shader
Texture2D<float> img;
RWTexture2D<float> outimg;
groupshared float smem[(BLOCKDIM + 2) * (BLOCKDIM + 2)];
// Read texture data into smem for this group
// Synchronize the threads
GroupMemoryBarrierWithGroupSync();
float result = 0.0f;
for (int j = 0; j < h; j++)
{
    for (int i = 0; i < w; i++)
    {
        result += smem[offset + (BLOCKDIM + 2) * j + i] * kernel[w * j + i];
    }
}
outimg[int2(x, y)] = result;
Here, BLOCKDIM is the width (and height) of threads in a group, and offset is an offset into shared memory, which is a function of the thread ID within a group.
The compute shader method substantially reduces the number of redundant fetches necessary compared to the pixel shader method, especially when using an inseparable kernel.
 
No comments:
Post a Comment