A blog which discusses various GPU applications including visualization, GPGPU and games.

Wednesday, June 3, 2009

Unicode

I like to make my C++ applications Unicode-aware. What do I mean by this? I use UTF-16 where I can, and convert between UTF-16 and UTF-8 if necessary.

C++ has a wide character type, wchar_t, which is used for storing wide characters. The problem with wchar_t is that it its size is platform-specific; on Windows, wchar_t is 2 bytes while on most *nix-based machines it's 4 bytes. In other words, on Windows it would be used for storing UTF-16 and on *nix, it'd be used for storing UTF-32.

This complication is reason enough for many libraries to avoid wchar_t altogether and simply use UTF-8. However, I prefer UTF-16 as I find it to be a nice trade-off between UTF-8 and UTF-32: efficient and more compact than UTF-32 in most cases.

Luckily, C++0x adds two new character types: char16_t for storing UTF-16 characters and char32_t for storing UTF-32 characters. With these new types, it will be possible to write cleaner, portable, Unicode-aware C++0x code.

As an aside, Windows charmap cannot display characters past 0xFFFF, which I find to be annoying. So, I've begun writing my own Unicode character viewer using Direct2D and DirectWrite.



Followers

Contributors