More fun with autodecoding

Chris wendlec at tcd.ie
Mon Sep 10 08:45:27 UTC 2018


On Saturday, 8 September 2018 at 15:36:25 UTC, Steven 
Schveighoffer wrote:
> On 8/9/18 2:44 AM, Walter Bright wrote:

>
> So it turns out that technically the problem here, even though 
> it seemed like an autodecoding problem, is a problem with 
> splitter.
>
> splitter doesn't deal with encodings of character ranges at all.
>
> For instance, when you have this:
>
> "abc 123".byCodeUnit.splitter;
>
> What happens is splitter only has one overload that takes one 
> parameter, and that requires a character *array*, not a range.
>
> So the byCodeUnit result is aliased-this to its original, and 
> surprise! the elements from that splitter are string.
>
> Next, I tried to use a parameter:
>
> "abc 123".byCodeUnit.splitter(" ");
>
> Nope, still devolves to string. It turns out it can't figure 
> out how to split character ranges using a character array as 
> input.
>
> The only thing that does seem to work is this:
>
> "abc 123".byCodeUnit.splitter(" ".byCodeUnit);
>

After a while your code will be cluttered with absurd stuff like 
this. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my 
experience with `splitter` et. al. I tried to create my own 
parser to have better control over every step. After a few 
*minutes* of testing things I ran into this bug [1] that didn't 
get fixed till early 2018. I never started to write my own 
step-by-step parser. I'm glad I didn't.

I wish people began to realize that string handling is a basic 
necessity and that the correct handling of strings is of utmost 
importance. Please keep us updated on how things work out (or 
not) for you.

[Please, nobody answer my post pointing out that a) we don't 
understand Unicode and b) that it's an insult to the Universe to 
draw attention to flaws that keep pestering us on an almost daily 
basis - without trying to fix them ourselves stante pede. As is 
clear from Steve's efforts, the Universe doesn't seem to care.)

[1] https://issues.dlang.org/show_bug.cgi?id=16739

[snip]


More information about the Digitalmars-d mailing list