Wide characters support in D

Tue Jun 8 12:57:38 PDT 2010

Walter Bright:
> The problem with dchar's is strings of them consume 
> memory at a prodigious rate.

Warning: lazy musings ahead.

I hope we'll soon have computers with 200+ GB of RAM where using strings that use less than 32-bit chars is in most cases a premature optimization (like today is often a silly optimization to use arrays of 16-bit ints instead of 32-bit or 64-bit ints. Only special situations found with the profiler can justify the use of arrays of shorts in a low level language).

Even in PCs with 200 GB of RAM the first levels of CPU caches can be very small (like 32 KB), and cache misses are costly, so even if huge amounts of RAMs are present, to increase performance it can be useful to reduce the size of strings.

A possible solution to this problem can be some kind of real-time hardware compression/decompression between the CPU and the RAM. UTF-8 can be a good enough way to compress 32-bit strings. So we are back to writing low-level programs that have to deal with UTF-8.

To avoid this, CPUs and RAM can compress/decompress the text transparently to the programmer. Unfortunately UTF-8 is a variable-length encoding, so maybe it can't be done transparently enough. So a smarter and better compression algorithm can be used to keep all this transparent enough (not fully transparent, some low-level situations can require code that deals with the compression).

Bye,
bearophile