Dicebot on leaving D: It is anarchy driven development in all its glory.

aliak something at something.com
Thu Sep 6 21:15:59 UTC 2018


On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis 
wrote:
> On Thursday, September 6, 2018 1:04:45 PM MDT aliak via 
> Digitalmars-d wrote:
>> D makes the code-point case default and hence that becomes the
>> simplest to use. But unfortunately, the only thing I can think 
>> of
>> that requires code point representations is when dealing
>> specifically with unicode algorithms (normalization, etc). 
>> Here's
>> a good read on code points:
>> https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
>> icode-code-points/ -
>>
>> tl;dr: application logic does not need or want to deal with 
>> code points. For speed units work, and for correctness, 
>> graphemes work.
>
> I think that it's pretty clear that code points are objectively 
> the worst level to be the default. Unfortunately, changing it 
> to _anything_ else is not going to be an easy feat at this 
> point. But if we can first ensure that Phobos in general 
> doesn't rely on it (i.e. in general, it can deal with ranges of 
> char, wchar, dchar, or graphemes correctly rather than assuming 
> that all ranges of characters are ranges of dchar), then maybe 
> we can figure something out. Unfortunately, while some work has 
> been done towards that, what's mostly happened is that folks 
> have complained about auto-decoding without doing much to 
> improve the current situation. There's a lot more to this than 
> simply ripping out auto-decoding even if every D user on the 
> planet agreed that outright breaking almost every existing D 
> program to get rid of auto-decoding was worth it. But as with 
> too many things around here, there's a lot more talking than 
> working. And actually, as such, I should probably stop 
> discussing this and go do something useful.
>
> - Jonathan M Davis

Is there a unittest somewhere in phobos you know that one can be 
pointed to that shows the handling of these 4 variations you say 
should be dealt with first? Or maybe a PR that did some of this 
work that one could investigate?

I ask so I can see in code what it means to make something not 
rely on autodecoding and deal with ranges of char, wchar, dchar 
or graphemes.

Or a current "easy" bugzilla issue maybe that one could try a 
hand at?


More information about the Digitalmars-d mailing list