Dicebot on leaving D: It is anarchy driven development in all its glory.
Chris
wendlec at tcd.ie
Thu Sep 6 10:40:00 UTC 2018
On Thursday, 6 September 2018 at 10:22:22 UTC, ag0aep6g wrote:
> On 09/06/2018 09:23 AM, Chris wrote:
>> Python 3 gives me this:
>>
>> print(len("á"))
>> 1
>
> Python 3 also gives you this:
>
> print(len("á"))
> 2
>
> (The example might not survive transfer from me to you if
> Unicode normalization happens along the way.)
>
> That's when you enter the 'á' as 'a' followed by U+0301
> (combining acute accent). So Python's `len` counts in code
> points, like D's std.range does (auto-decoding).
To avoid this you have to normalize and recompose any decomposed
characters. I remember that Mac OS X used (and still uses?)
decomposed characters by default, so when you typed 'á' into your
cli, it would automatically decompose it to 'a' + acute. `string`
however returns len=2 for composed characters too. If you do a
lot of string handling it will come back to bite you sooner or
later.
More information about the Digitalmars-d
mailing list