Unicode problems?
Daniel Keep
daniel.keep.lists at gmail.com
Mon Feb 16 05:39:01 PST 2009
Trass3r wrote:
> Wikipedia states that D still has some Unicode problems:
> "Operations on Unicode strings are unintuitive (compiler accepts Unicode
> source code, standard library and foreach constructs operate on UTF-8,
> but string slicing and length property operate on bytes rather than
> characters)."
>
> Is this information correct?
They're not bugs, if that's what you mean. It's just a side-effect of
how Unicode works.
http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
Long story short: they operate on bytes because operating on actual code
points can't be done efficiently [1].
-- Daniel
[1] Given that strings are implemented as arrays with a given,
non-changing width and that you're not using UTF-32 which no one does
because it's too big and that we don't add some fancy caching stuff to
char[] arrays specifically, blah blah blah.
More information about the Digitalmars-d-learn
mailing list