Making all strings UTF ranges has some risk of WTF

Thu Feb 4 17:41:48 PST 2010

Andrei Alexandrescu wrote:
> One idea I've had for a while was to have a universal string type:
> 
> struct UString {
>     union {
>         char[] utf8;
>         wchar[] utf16;
>         dchar[] utf32;
>     }
>     enum Discriminator { utf8, utf16, utf32 };
>     Discriminator kind;
>     IntervalTree!(size_t) skip;
>     ...
> }
> 
> The IntervalTree stores the skip amounts that must be added for a given
> index in the string. For ASCII strings that would be null. Then its size
> grows with the number of multibyte characters. Beyond a threshold,
> representation is transparently switched to utf16 or utf32 as needed and
> the tree becomes smaller or null again.

Although I see some potential in a universal string type, I don't think
this is the right implementation strategy.  I'd rather have my short
strings in utf-32 (optimized for speed) and my long strings in
utf-8/utf-16 (optimized for memory usage).

-- 
Rainer Deyke - rainerd at eldwood.com