Dicebot on leaving D: It is anarchy driven development in all its glory.

aliak something at something.com
Thu Sep 6 14:17:28 UTC 2018


On Wednesday, 5 September 2018 at 22:00:27 UTC, H. S. Teoh wrote:
> Because grapheme decoding is SLOW, and most of the time you 
> don't even need it anyway.  SLOW as in, it will easily add a 
> factor of 3-5 (if not worse!) to your string processing time, 
> which will make your natively-compiled D code a laughing stock 
> of interpreted languages like Python.  It will make 
> autodecoding look like an optimization(!).

Hehe, it's already a bit laughable that correctness is not 
preferred.

// Swift
let a = "á"
let b = "á"
let c = "\u{200B}" // zero width space
let x = a + c + a
let y = b + c + b

print(a.count) // 1
print(b.count) // 1
print(x.count) // 3
print(y.count) // 3

print(a == b) // true
print("ááááááá".range(of: "á") != nil) // true

// D
auto a = "á";
auto b = "á";
auto c = "\u200B";
auto x = a ~ c ~ a;
auto y = b ~ c ~ b;

writeln(a.length); // 2 wtf
writeln(b.length); // 3 wtf
writeln(x.length); // 7 wtf
writeln(y.length); // 9 wtf

writeln(a == b); // false wtf
writeln("ááááááá".canFind("á")); // false wtf

Tell me which one would cause the giggles again?

If speed is the preference over correctness (which I very much 
disagree with, but for arguments sake...) then still code points 
are the wrong choice. So, speed was obviously (??) not the reason 
to prefer code points as the default.

Here's a read on how swift 4 strings behave. Absolutely amazing 
work there: https://oleb.net/blog/2017/11/swift-4-strings/

>
> Grapheme decoding is really only necessary when (1) you're 
> typesetting a Unicode string, and (2) you're counting the 
> number of visual characters taken up by the string (though 
> grapheme counting even in this case may not give you what you 
> want, thanks to double-width characters, zero-width characters, 
> etc. -- though it can form the basis of correct counting code).

Yeah nah. Those are not the only 2 cases *ever* where grapheme 
decoding is correct. I don't think one can list all the cases 
where grapheme decoding is the correct behavior. Off the op of me 
head you've already forgotten comparisons. And on top of that, 
comparing and counting has a bajillion* use cases.

* number is an exaggeration.

>
> For all other cases, you really don't need grapheme decoding, 
> and being forced to iterate over graphemes when unnecessary 
> will add a horrible overhead, worse than autodecoding does 
> today.

As opposed to being forced to iterate with incorrect results? I 
understand that it's slower. I just don't think that justifies 
incorrect output. I agree with everything you've said next 
though, that people should understand unicode.

>
> //
>
> Seriously, people need to get over the fantasy that they can 
> just use Unicode without understanding how Unicode works.  Most 
> of the time, you can get the illusion that it's working, but 
> actually 99% of the time the code is actually wrong and will do 
> the wrong thing when given an unexpected (but still valid) 
> Unicode string.  You can't drive without a license, and even if 
> you try anyway, the chances of ending up in a nasty accident is 
> pretty high.  People *need* to learn how to use Unicode 
> properly before complaining about why this or that doesn't work 
> the way they thought it should work.

I agree that you should know about unicode. And maybe you can't 
be correct 100% of the time but you can very well get much closer 
than were D is right now.

And yeah, you can't drive without a license, but most cars 
hopefully don't show you an incorrect speedometer reading because 
it produces faster drivers.

>
>
> T
> --
> Gone Chopin. Bach in a minuet.

Lol :D



More information about the Digitalmars-d mailing list