string is rarely useful as a function argument

Walter Bright newshound2 at digitalmars.com
Sat Dec 31 00:04:50 PST 2011


On 12/30/2011 11:09 PM, Andrei Alexandrescu wrote:
> On 12/30/11 10:09 PM, Walter Bright wrote:
>> I'm not so sure about that. Timon Gehr's X macro tried to handle UTF-8
>> correctly, but it turned out that the naive version that used [i] and
>> .length worked correctly. This is typical, not exceptional.
>
> The lower frequency of bugs makes them that much more difficult to spot. This is
> essentially similar to the UTF16/UCS-2 morass: in a vast majority of the time
> the programmer may consider UTF16 a coding with one code unit per code point
> (which is what UCS-2 is). The existence of surrogates didn't make much of a
> difference because, again, very often the wrong assumption just worked. Well
> that all didn't go over all that well.

I'm not so sure it's quite the same. Java was designed before there were 
surrogate pairs, they kinda got the rug pulled out from under them. So, they 
simply have no decent way to deal with it. There isn't even a notion of a dchar 
character type. Java was designed with codeunit==codepoint, it is embedded in 
the design of the language, library, and culture.

This is not true of D. It's designed from the ground up to deal properly with 
UTF. D has very simple language features to deal with it.

> We need .raw and we must abolish .length and [] for narrow strings.

I don't believe that fixes anything and breaks every D project out there. We're 
chasing phantoms here, and I worry a lot about over-engineering trivia.

And, we already have a type to deal with it: dstring


More information about the Digitalmars-d mailing list