Why the hell doesn't foreach decode strings

Christophe travert at phare.normalesup.org
Fri Oct 28 03:04:13 PDT 2011


Dmitry Olshansky , dans le message (digitalmars.D:147415), a écrit :
> Assuming language support stays on stage of "codepoint is a character" 
> it's totaly expected to ignore modifiers and compare identically 
> normalized UTF without decoding. Yes, it risks to hit certain issues.

string being seen as range of codepoint (dchar) is already aweful 
enough. Now seeing strings as range of displayable caracters just do not 
make sense. Unicode is too complicated to allow doing this for a general 
purpose string manipulation. All the transformations to displayable 
characters can only be done when displaying characters !

Just like fiancé is hidden is you write fiance' (with the approriate 
unicode character to have the ' placed over the 'e'). You can hide any 
word by using delete characters. You have to make asumption on the 
input, and you have to put limitations to the algorithm because in any 
case, you can have unexpected behavior. And I can assure you there is 
less unexpected behavior if you treat strings as dchar range or even 
char[], than if you treat them as displayable characters.

> It's a complete mess even with proper decoding ;)

Sure, that's why we better not decode.


More information about the Digitalmars-d mailing list