char array weirdness

Marco Leise via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Mar 29 15:29:11 PDT 2016


Am Mon, 28 Mar 2016 16:29:50 -0700
schrieb "H. S. Teoh via Digitalmars-d-learn"
<digitalmars-d-learn at puremagic.com>:

> […] your diacritics may get randomly reattached to
> stuff they weren't originally attached to, or you may end up with wrong
> sequences of Unicode code points (e.g. diacritics not attached to any
> grapheme). Using filter() on Korean text, even with autodecoding, will
> pretty much produce garbage. And so on.

I'm on the same page here. If it ain't ASCII parsable, you
*have* to make a conscious decision about whether you need
code units or graphemes. I've yet to find out about the use
cases for auto-decoded code-points though.

> So in short, we're paying a performance cost for something that's only
> arguably better but still not quite there, and this cost is attached to
> almost *everything* you do with strings, regardless of whether you need
> to (e.g., when you know you're dealing with pure ASCII data).

An unconscious decision made by the library that yields the
least likely intended and expected result? Let me think ...
mhhh ... that's worse than iterating by char. No talking
back :p.

-- 
Marco



More information about the Digitalmars-d-learn mailing list