String implementations
James Dennett
jdennett at acm.org
Sun Jan 20 11:07:32 PST 2008
Walter Bright wrote:
> James Dennett wrote:
>> (On the other hand, D is weak here
>> because it identifies UTF8 strings with arrays of char, but char
>> doesn't hold a UTF8 character. I can't imagine persuading Walter
>> that this is a horrible error is going to work though.)
>
> I've actually done considerable work with UTF-8, both in C++ and D.
Yes, by this stage most serious programmers have had to learn
in some detail how to work with UTF-8.
> D's
> method of dealing with it works out very well (and very naturally).
I've given specific problems with it. I've heard no refutation
of them. D uses essentially a model of UTF8 which is really just
a bunch-of-bytes with smart iteration. C-based projects on which
I worked in the 90's did similarly, but with coding conventions
that banned direct access to the bytes.
> This is why you'll have a hard time persuading me otherwise <g>.
Because you assert that there's not a problem? ;)
> Note that C++0x is doing things similarly:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2249.html
Looks very different to me. There's no conflation of char with a
code unit of UTF8 (and indeed C++ deliberately supports use of
varied encodings for multi-byte characters). Yes, C++ is adding
16- and 32-bit character types which are more akin to D's, but that
has little bearing on how differently it handles multi-byte (as
opposed to wide-character) strings.
-- James
More information about the Digitalmars-d
mailing list