Dicebot on leaving D: It is anarchy driven development in all its glory.

ag0aep6g anonymous at example.com
Thu Sep 6 11:43:31 UTC 2018


On 09/06/2018 12:40 PM, Chris wrote:
> To avoid this you have to normalize and recompose any decomposed 
> characters. I remember that Mac OS X used (and still uses?) decomposed 
> characters by default, so when you typed 'á' into your cli, it would 
> automatically decompose it to 'a' + acute. `string` however returns 
> len=2 for composed characters too. If you do a lot of string handling it 
> will come back to bite you sooner or later.

You say that D users shouldn't need a '"Unicode license" before they do 
anything with strings'. And you say that Python 3 gets it right (or 
maybe less wrong than D).

But here we see that Python requires a similar amount of Unicode 
knowledge. Without your Unicode license, you couldn't make sense of 
`len` giving different results for two strings that look the same.

So both D and Python require a Unicode license. But on top of that, D 
also requires an auto-decoding license. You need to know that `string` 
is both a range of code points and an array of code units. And you need 
to know that `.length` belongs to the array side, not the range side. 
Once you know that (and more), things start making sense in D.

My point is: D doesn't require more Unicode knowledge than Python. But 
D's auto-decoding gives `string` a dual nature, and that can certainly 
be confusing. It's part of why everybody dislikes auto-decoding.

(Not saying that Python is free from such pitfalls. I simply don't know 
the language well enough.)


More information about the Digitalmars-d mailing list