Today's programming challenge - How's your Range-Fu ?
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Mon Apr 20 07:57:59 PDT 2015
On Monday, 20 April 2015 at 11:04:58 UTC, Panke wrote:
>>
>> Yes, again and again I encountered length related bugs with
>> Unicode characters. Normalization is not 100% reliable.
>
> I think it is 100% reliable, it just doesn't make the problems
> go away. It just guarantees that two strings normalized to the
> same form are binary equal iff they are equal in the unicode
> sense. Nothing about columns or string length or grapheme count.
The problem is not normalization as such, the problem is with
string (as opposed to dstring):
import std.uni : normalize, NFC;
void main() {
dstring de_one = "é";
dstring de_two = "e\u0301";
assert(de_one.length == 1);
assert(de_two.length == 2);
string e_one = "é";
string e_two = "e\u0301";
string random = "ab";
assert(e_one.length == 2);
assert(e_two.length == 3);
assert(e_one.length == random.length);
assert(normalize!NFC(e_one).length == 2);
assert(normalize!NFC(e_two).length == 2);
}
This can lead to subtle bugs, cf. length of random and e_one. You
have to convert everything to dstring to get the "expected"
result. However, this is not always desirable.
More information about the Digitalmars-d
mailing list