Why the hell doesn't foreach decode strings

Steven Schveighoffer schveiguy at yahoo.com
Wed Oct 26 05:18:34 PDT 2011


On Mon, 24 Oct 2011 19:49:43 -0400, Simen Kjaeraas  
<simen.kjaras at gmail.com> wrote:

> On Mon, 24 Oct 2011 21:41:57 +0200, Steven Schveighoffer  
> <schveiguy at yahoo.com> wrote:
>
>> Plus, a combining character (such as an umlaut or accent) is part of a
>> character, but may be a separate code point.
>
> If this is correct (and it is), then decoding to dchar is simply not  
> enough.
> You seem to advocate decoding to graphemes, which is a whole different  
> matter.

I am advocating that.  And it's a matter of perception.  D can say "we  
only support code-point decoding" and what that means to a user is, "we  
don't support language as you know it."  Sure it's a part of unicode, but  
it takes that extra piece to make it actually usable to people who require  
unicode.

Even in English, fiancé has an accent.  To say D supports unicode, but  
then won't do a simple search on a file which contains a certain *valid*  
encoding of that word is disingenuous to say the least.

D needs a fully unicode-aware string type.  I advocate D should use it as  
the default string type, but it needs one whether it's the default or not  
in order to say it supports unicode.

-Steve


More information about the Digitalmars-d mailing list