Why the hell doesn't foreach decode strings
Steven Schveighoffer
schveiguy at yahoo.com
Wed Oct 26 05:18:34 PDT 2011
On Mon, 24 Oct 2011 19:49:43 -0400, Simen Kjaeraas
<simen.kjaras at gmail.com> wrote:
> On Mon, 24 Oct 2011 21:41:57 +0200, Steven Schveighoffer
> <schveiguy at yahoo.com> wrote:
>
>> Plus, a combining character (such as an umlaut or accent) is part of a
>> character, but may be a separate code point.
>
> If this is correct (and it is), then decoding to dchar is simply not
> enough.
> You seem to advocate decoding to graphemes, which is a whole different
> matter.
I am advocating that. And it's a matter of perception. D can say "we
only support code-point decoding" and what that means to a user is, "we
don't support language as you know it." Sure it's a part of unicode, but
it takes that extra piece to make it actually usable to people who require
unicode.
Even in English, fiancé has an accent. To say D supports unicode, but
then won't do a simple search on a file which contains a certain *valid*
encoding of that word is disingenuous to say the least.
D needs a fully unicode-aware string type. I advocate D should use it as
the default string type, but it needs one whether it's the default or not
in order to say it supports unicode.
-Steve
More information about the Digitalmars-d
mailing list