Making all strings UTF ranges has some risk of WTF
Rainer Deyke
rainerd at eldwood.com
Wed Feb 3 22:07:03 PST 2010
Andrei Alexandrescu wrote:
> Arrays of char and wchar are not quite generic - they are definitely UTF
> strings.
A 'char' is a single utf-8 code unit. A 'char[]' is (or should be) a
generic array of utf-8 code units. Sometimes these code units line up
to form valid unicode code points, sometimes they don't.
If you want a data type that always contains a valid utf-8 string, don't
call it 'char[]'. It's misleading, it breaks generic code, and it
renders built-in arrays useless for when you actually want an array of
utf-8 code units. It's the same mistake as std::vector<bool> in C++,
but much worse.
--
Rainer Deyke - rainerd at eldwood.com
More information about the Digitalmars-d
mailing list