Dicebot on leaving D: It is anarchy driven development in all its glory.
ag0aep6g
anonymous at example.com
Thu Sep 6 11:43:31 UTC 2018
On 09/06/2018 12:40 PM, Chris wrote:
> To avoid this you have to normalize and recompose any decomposed
> characters. I remember that Mac OS X used (and still uses?) decomposed
> characters by default, so when you typed 'á' into your cli, it would
> automatically decompose it to 'a' + acute. `string` however returns
> len=2 for composed characters too. If you do a lot of string handling it
> will come back to bite you sooner or later.
You say that D users shouldn't need a '"Unicode license" before they do
anything with strings'. And you say that Python 3 gets it right (or
maybe less wrong than D).
But here we see that Python requires a similar amount of Unicode
knowledge. Without your Unicode license, you couldn't make sense of
`len` giving different results for two strings that look the same.
So both D and Python require a Unicode license. But on top of that, D
also requires an auto-decoding license. You need to know that `string`
is both a range of code points and an array of code units. And you need
to know that `.length` belongs to the array side, not the range side.
Once you know that (and more), things start making sense in D.
My point is: D doesn't require more Unicode knowledge than Python. But
D's auto-decoding gives `string` a dual nature, and that can certainly
be confusing. It's part of why everybody dislikes auto-decoding.
(Not saying that Python is free from such pitfalls. I simply don't know
the language well enough.)
More information about the Digitalmars-d
mailing list