String implementations
James Dennett
jdennett at acm.org
Sun Jan 20 14:55:26 PST 2008
Janice Caron wrote:
> On 1/20/08, James Dennett <jdennett at acm.org> wrote:
>> Looks very different to me.
>
> I thought it looked very similar indeed to D, but there you go. Funny
> how two different people can read the same document and interpret it
> in two different ways.
The core issue here, to me, is D's half-hearted attempt to paint
char[] as a Unicode string type. C++ has nothing analagous.
>> There's no conflation of char with a
>> code unit of UTF8
>
> C has no ubyte type. Since time immemorial, C programmers have been
> using the char type to store every 8-bit wide data type under the sun
> simply because there's been no alternative (until recently, when
> int8_t showed up as a typedef for char).
int8_t is necessarily signed, a la "signed char", not a typedef
for "char", whose signedness varies (but, unfortunately, is often
signed in C and C++).
> That's not a big deal.
>
>
>> (and indeed C++ deliberately supports use of
>> varied encodings for multi-byte characters).
>
> I must have misread the heading that says "Require UTF", and whose
> text reads "The C TR makes the encoding of char16_t and char32_t
> implementation-defined. It also provides macros to indicate whether or
> not the encoding is UTF. In contrast, this proposal requires UTF
> encoding."
>
> Oh, I see what you're saying - C++ would require UTF for wchar and
> dchar, but not for char. Well, that's historical legacy for you.
And it's the real world; computer systems need to interface
with existing systems which us diverse encodings.
>> Yes, C++ is adding
>> 16- and 32-bit character types which are more akin to D's, but that
>> has little bearing on how differently it handles multi-byte (as
>> opposed to wide-character) strings.
>
> So it has a bunch of procedural functions instead of foreach. Apart
> from that, the approach seems the same as D. Where's the difference?
Philosophy: D pushes char[] as if it were a proper UTF8 facility,
and goes a small step towards adding language support for that.
C++ recognizes diversity in multi-byte character encodings, and
doesn't make the language promote one over any other. It admits
up-front that you're dealing with code units if you want to work
with multi-byte characters.
C++ is a long, long way from perfect when it comes to Unicode
support. Even C++0x will be. But I'm hoping for more from D,
and what I see so far can stand some improvement.
-- James
More information about the Digitalmars-d
mailing list