Dicebot on leaving D: It is anarchy driven development in all its glory.

Chris wendlec at tcd.ie
Thu Sep 6 08:20:16 UTC 2018


On Thursday, 6 September 2018 at 07:54:09 UTC, Joakim wrote:
> On Thursday, 6 September 2018 at 07:23:57 UTC, Chris wrote:
>> On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh 
>> wrote:
>>
>>>
>>> //
>>>
>>> Seriously, people need to get over the fantasy that they can 
>>> just use Unicode without understanding how Unicode works.  
>>> Most of the time, you can get the illusion that it's working, 
>>> but actually 99% of the time the code is actually wrong and 
>>> will do the wrong thing when given an unexpected (but still 
>>> valid) Unicode string.  You can't drive without a license, 
>>> and even if you try anyway, the chances of ending up in a 
>>> nasty accident is pretty high.  People *need* to learn how to 
>>> use Unicode properly before complaining about why this or 
>>> that doesn't work the way they thought it should work.
>>>
>>>
>>> T
>>
>> Python 3 gives me this:
>>
>> print(len("á"))
>> 1
>>
>> and so do other languages.
>
> The same Python 3 that people criticize for having unintuitive 
> unicode string handling?
>
> https://learnpythonthehardway.org/book/nopython3.html
>
>> Is it asking too much to ask for `string` (not `dstring` or 
>> `wstring`) to behave as most people would expect it to behave 
>> in 2018 - and not like Python 2 from days of yore? But of 
>> course, D users should have a "Unicode license" before they do 
>> anything with strings. (I wonder is there a different license 
>> for UTF8 and UTF16 and UTF32, Big / Little Endian, BOM? Just 
>> asking.)
>
> Yes and no, unicode is a clusterf***, so every programming 
> language is having problems with it.
>
>> So again, for the umpteenth time, it's the users' fault. I 
>> see. Ironically enough, it was the language developers' lack 
>> of understanding of Unicode that led to string handling being 
>> a nightmare in D in the first place. Oh lads, if you were 
>> politicians I'd say that with this attitude you're gonna the 
>> next election. I say this, because many times the posts by 
>> (core) developers remind me so much of politicians who are 
>> completely detached from the reality of the people. Right oh!
>
> You have a point that it was D devs' ignorance of unicode that 
> led to the current auto-decoding problem. But let's have some 
> nuance here, the problem ultimately is unicode.

Yes, Unicode is a beast that is hard to tame. But there is, 
afaik, not even a proper plan to tackle the whole thing in D, 
just patches. D has autodecoding which slows things down but 
doesn't even work correctly at the same time. However, it cannot 
be removed due to massive code breakage. So you sacrifice speed 
for security (fine) - but the security doesn't even exist. So 
what's the point? Also, there aren't any guidelines about how to 
use strings in different contexts. So after a while your code 
ends up being a mess of .byCodePoint / .byGrapheme / string / 
dstring whatever, and you never know if you really got it right 
or not (performance wise and other).

We're talking about a basic functionality like string handling. 
String handling is very important these days (data harvesting, 
translation tools) and IT is used all over the world where you 
have to deal with different alphabets that are outside the ASCII 
range. And because it's such a basic functionality, you don't 
want to waste time having to think about it.


More information about the Digitalmars-d mailing list