Making all strings UTF ranges has some risk of WTF
grauzone
none at example.net
Thu Feb 4 17:15:29 PST 2010
Andrei Alexandrescu wrote:
> Rainer Deyke wrote:
>> Don wrote:
>>> I suspect that string, wstring should have been the primary types and
>>> had a .codepoints property, which returned a ubyte[] resp. ushort[]
>>> reference to the data. It's too late, of course. The extra value you get
>>> by having a specific type for 'this is a code point for a UTF8 string'
>>> seems to be very minor, compared to just using a ubyte.
>>
>> If it's not too late to completely change the semantics of char[], then
>> it's also not too late to dump 'char' completely. If it /is/ too late
>> to remove 'char', then 'char[]' should retain the current semantics and
>> a new string type should be added for the new semantics.
>
> One idea I've had for a while was to have a universal string type:
>
> struct UString {
> union {
> char[] utf8;
> wchar[] utf16;
> dchar[] utf32;
> }
> enum Discriminator { utf8, utf16, utf32 };
> Discriminator kind;
> IntervalTree!(size_t) skip;
> ...
> }
You mean like this?
http://www.dprogramming.com/mtext.php
More information about the Digitalmars-d
mailing list