Dicebot on leaving D: It is anarchy driven development in all its glory.

Chris wendlec at tcd.ie
Thu Sep 6 10:40:00 UTC 2018


On Thursday, 6 September 2018 at 10:22:22 UTC, ag0aep6g wrote:
> On 09/06/2018 09:23 AM, Chris wrote:
>> Python 3 gives me this:
>> 
>> print(len("á"))
>> 1
>
> Python 3 also gives you this:
>
> print(len("á"))
> 2
>
> (The example might not survive transfer from me to you if 
> Unicode normalization happens along the way.)
>
> That's when you enter the 'á' as 'a' followed by U+0301 
> (combining acute accent). So Python's `len` counts in code 
> points, like D's std.range does (auto-decoding).

To avoid this you have to normalize and recompose any decomposed 
characters. I remember that Mac OS X used (and still uses?) 
decomposed characters by default, so when you typed 'á' into your 
cli, it would automatically decompose it to 'a' + acute. `string` 
however returns len=2 for composed characters too. If you do a 
lot of string handling it will come back to bite you sooner or 
later.


More information about the Digitalmars-d mailing list