Why the hell doesn't foreach decode strings
Christophe
travert at phare.normalesup.org
Fri Oct 28 03:04:13 PDT 2011
Dmitry Olshansky , dans le message (digitalmars.D:147415), a écrit :
> Assuming language support stays on stage of "codepoint is a character"
> it's totaly expected to ignore modifiers and compare identically
> normalized UTF without decoding. Yes, it risks to hit certain issues.
string being seen as range of codepoint (dchar) is already aweful
enough. Now seeing strings as range of displayable caracters just do not
make sense. Unicode is too complicated to allow doing this for a general
purpose string manipulation. All the transformations to displayable
characters can only be done when displaying characters !
Just like fiancé is hidden is you write fiance' (with the approriate
unicode character to have the ' placed over the 'e'). You can hide any
word by using delete characters. You have to make asumption on the
input, and you have to put limitations to the algorithm because in any
case, you can have unexpected behavior. And I can assure you there is
less unexpected behavior if you treat strings as dchar range or even
char[], than if you treat them as displayable characters.
> It's a complete mess even with proper decoding ;)
Sure, that's why we better not decode.
More information about the Digitalmars-d
mailing list