Unicode problems?
Chris Nicholson-Sauls
ibisbasenji at gmail.com
Mon Feb 16 06:38:19 PST 2009
Daniel Keep wrote:
>
> Trass3r wrote:
>> Wikipedia states that D still has some Unicode problems:
>> "Operations on Unicode strings are unintuitive (compiler accepts Unicode
>> source code, standard library and foreach constructs operate on UTF-8,
>> but string slicing and length property operate on bytes rather than
>> characters)."
>>
>> Is this information correct?
>
> They're not bugs, if that's what you mean. It's just a side-effect of
> how Unicode works.
>
> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
>
> Long story short: they operate on bytes because operating on actual code
> points can't be done efficiently [1].
>
> -- Daniel
>
> [1] Given that strings are implemented as arrays with a given,
> non-changing width and that you're not using UTF-32 which no one does
> because it's too big and that we don't add some fancy caching stuff to
> char[] arrays specifically, blah blah blah.
I use UTF-32, at least occasionally. In cases where I specifically
expect/encourage multilingual support/use, it can simplify matters
greatly, where those otherwise inefficient operations become common.
-- Chris Nicholson-Sauls
More information about the Digitalmars-d-learn
mailing list