A blog which discusses various GPU applications including visualization, GPGPU and games.

Saturday, February 21, 2009

Image Convolution

One of the most commonly performed image post-processing effect is the image convolution. A number of tricks are employed to make convolutions more efficient on the GPU, such as using separable convolutions, upscaling a smaller image to fake a blur convolution, etc.

The problem with using the pixel shader to perform convolutions is the redundant texture fetching. Imagine the convolution window being slid to the right by one pixel: each time, there is a large overlap in texture fetches. Ideally, we should be able to fetch the information from the texture once, and store it into a cache. This is where compute shaders come in.

Compute shaders allow access to "groupshared" memory: in other words, memory that is shared amongst all of the threads in a group. Essentially what we can do is fill up a group's shared memory with a chunk of the texture, synchronize the threads, and then continue with the convolution. Only this time, we reference the shared memory instead of the texture.

In a future post, I will provide a more complete example. But for now, I will outline the two methods:

Method A: Pixel shader

Texture2D<float> img;

float result = 0.0f;

int w2 = (w - 1) / 2;
int h2 = (h - 1) / 2;

for (int j = -h2; j <= h2; j++)
for (int i = -w2; i <= w2; i++)
result += img[int2(x + i, y + j)] * kernel[w * (j + h2) + (i + w2)];

return result;

Above, x and y represent the position of the pixel being processed, while w and h are the width and height of the convolution kernel.

Method B: Compute shader

Texture2D<float> img;
RWTexture2D<float> outimg;

groupshared float smem[(BLOCKDIM + 2) * (BLOCKDIM + 2)];

// Read texture data into smem for this group

// Synchronize the threads

float result = 0.0f;

for (int j = 0; j < h; j++)
for (int i = 0; i < w; i++)
result += smem[offset + (BLOCKDIM + 2) * j + i] * kernel[w * j + i];

outimg[int2(x, y)] = result;

Here, BLOCKDIM is the width (and height) of threads in a group, and offset is an offset into shared memory, which is a function of the thread ID within a group.

The compute shader method substantially reduces the number of redundant fetches necessary compared to the pixel shader method, especially when using an inseparable kernel.

Friday, February 6, 2009

DirectWrite Text Layouts, Part 2

In my previous post, I briefly covered DirectWrite text layouts. In this post, I would like to go into greater depth.

A text layout essentially enables you to describe many aspects of the contents of a string - text size, text style, text weight, custom drawing effects, inline objects, etc. The methods provided by a text layout enables you to apply specific formatting to specific ranges of text.

To backtrack a little bit, there are multiple ways of rendering text with Direct2D and DirectWrite. The first way, a way which I consider to be at the highest level of abstraction, is the DrawText method provided by a Direct2D render target. This method can be used to draw simple text that requires no extensive formatting. This method does not take a text layout object at all, but instead a simpler text format object.

The second way, which I consider to be mid-level, is the DrawTextLayout method (again provided by a Direct2D render target). This method takes a text layout object and renders it.

The third way, which I consider to be the lowest level, is the Draw method provided by a DirectWrite text layout object. This Draw method takes a custom class (a class which implements the IDWriteTextRenderer class) and uses its callbacks to render. This may seem complex, but it is actually trivial to write a class which acts just like DrawTextLayout does in Direct2D.

I would first like to focus on the lowest level method, since it excites me the most. Using this method, the text rendering possibilities are truly endless. The IDWriteTextRenderer interface defines six functions. I am going to focus on the DrawGlyphRun method in this post.

When the Draw method is called on the text layout object with a custom class, it will call the class's DrawGlyphRun method for contiguous sets of glyphs that have similar formatting. You may be wondering how you are supposed to write a pass-through function that simply renders the glyph run it receives - simple! Direct2D provides a render target method also called "DrawGlyphRun" which is the absolute lowest level glyph rendering function that handles ClearType.

Obviously, this is not a very interesting thing to do; this is basically what Direct2D's DrawText and DrawTextLayout use. An example on MSDN illustrates a more interesting use of a custom renderer. What they have essentially done is retrieved the glyphs' geometric information, and used Direct2D's draw/fill geometry methods.

This brings me to another interesting use case: writing a custom rendering class to "suck out" the geometry from glyphs. This can be done to create vertex buffers for Direct3D for extruded text, as done in this example.

As I mentioned earlier, it is possible to apply custom drawing effects to ranges of text. The way this works is simple: an application-specific object and text range are provided to the SetDrawingEffect method of the text layout object. The object provided is passed to the application-defined DrawGlyphRun method. The data can then be used in any way imaginable. Think brushes, stroke styles, transformation matrix effects, etc.

You may be thinking: is it necessary to write a pass-through custom renderer just to use different brushes as drawing effects? The answer is no - the implementation provided by Direct2D's DrawTextLayout interprets drawing effects as brushes!

Sunday, February 1, 2009

DirectWrite Text Layouts

In experimenting with DirectWrite, I discovered how to apply specific formatting to substrings: DirectWrite text layout objects.

Consider the following code.


wstring text = L"This is a test of the text rendering services provided by Direct2D and DirectWrite. I am testing the quality and performance of these new APIs. So far, they are proving to be quite nice.";
m_spBackBufferRT->DrawText(text.c_str(), text.length(), m_spTextFormat, D2D1::RectF(0.0f, 0.0f, static_cast<float>(width), static_cast<float>(height)), m_spTextBrush, D2D1_DRAW_TEXT_OPTIONS_NO_CLIP);


The results are as expected.

Now, say I want to render the substrings "Direct2D" and "DirectWrite" in bold. One way would be to use the font metric methods of DirectWrite and render the paragraph in multiple pieces, but this feels a bit too tedious for what I want to do. A better approach would be to use a text layout object.

The following code does the trick.


wstring text = L"This is a test of the text rendering services provided by Direct2D and DirectWrite. I am testing the quality and performance of these new APIs. So far, they are proving to be quite nice.";

IDWriteTextLayout *m_spTextLayout;
m_spDWriteFactory->CreateTextLayout(text.c_str(), text.length(), m_spTextFormat, width, height, &m_spTextLayout);

DWRITE_TEXT_RANGE dtr = {58, 8};
m_spTextLayout->SetFontWeight(DWRITE_FONT_WEIGHT_BOLD, dtr);

DWRITE_TEXT_RANGE dtr = {71, 11};
m_spTextLayout->SetFontWeight(DWRITE_FONT_WEIGHT_BOLD, dtr);

m_spBackBufferRT->DrawTextLayout(D2D1::Point2F(0.0f, 0.0f), m_spTextLayout, m_spTextBrush);


It cannot get much simpler than that. All that is needed is to supply a substring range and method-specific argument and we are set.