Creeping Bloat in Phobos

Uranuz via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 28 12:44:39 PDT 2014


> I totally agree with all of that.
>
> It's one of those cases where correct by default is far too 
> slow (that would have to be graphemes) but fast by default is 
> far too broken. Better to force an explicit choice.
>
> There is no magic bullet for unicode in a systems language such 
> as D. The programmer must be aware of it and make choices about 
> how to treat it.

I see didn't know about difference between byCodeUnit and
byGrapheme, because I speak Russian and it is close to English,
because it doesn't have diacritics. As far as I remember German,
that I learned at school have diacritics. So you opened my eyes
in this question. My position as usual programmer is that I
speaking language which graphemes coded by 2 bytes and I alwas
need to do decoding otherwise my programme will be broken. Other
possibility is to use wstring or dstring, but it is less memory
efficient. Also UTF-8 is more commonly used in the Internet so I
don't want to do some conversions to UTF-32, for example.

Where I could read about byGrapheme? Isn't this approach
overcomplicated? I don't want to write Dostoevskiy's book "War
and Peace" in order to write some parser for simple DSL.


More information about the Digitalmars-d mailing list