String implementations

Sun Jan 20 11:07:32 PST 2008

Walter Bright wrote:
> James Dennett wrote:
>> (On the other hand, D is weak here
>> because it identifies UTF8 strings with arrays of char, but char
>> doesn't hold a UTF8 character.  I can't imagine persuading Walter
>> that this is a horrible error is going to work though.)
> 
> I've actually done considerable work with UTF-8, both in C++ and D.

Yes, by this stage most serious programmers have had to learn
in some detail how to work with UTF-8.

> D's 
> method of dealing with it works out very well (and very naturally).

I've given specific problems with it.  I've heard no refutation
of them.  D uses essentially a model of UTF8 which is really just
a bunch-of-bytes with smart iteration.  C-based projects on which
I worked in the 90's did similarly, but with coding conventions
that banned direct access to the bytes.

> This is why you'll have a hard time persuading me otherwise <g>.

Because you assert that there's not a problem? ;)

> Note that C++0x is doing things similarly:
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2249.html

Looks very different to me.  There's no conflation of char with a
code unit of UTF8 (and indeed C++ deliberately supports use of
varied encodings for multi-byte characters).  Yes, C++ is adding
16- and 32-bit character types which are more akin to D's, but that
has little bearing on how differently it handles multi-byte (as
opposed to wide-character) strings.

-- James