Notice/Warning on narrowStrings .length
Jonathan M Davis
jmdavisProg at gmx.com
Thu Apr 26 11:02:58 PDT 2012
On Thursday, April 26, 2012 13:51:17 Nick Sabalausky wrote:
> Also, keep in mind that (unless I'm mistaken) walkLength does *not* return
> the number of "characters" (ie, graphemes), but merely the number of code
> points - which is not the same thing (due to existence of the
> [confusingly-named] "combining characters").
You're not mistaken. Nothing in Phobos (save perhaps some of std.regex's
internals) deals with graphemes. It all operates on code points, and strings
are considered to be ranges of code points, not graphemes. So, as far as
ranges go, walkLength returns the actual length of the range. That's _usually_
the number of characters/graphemes as well, but it's certainly not 100%
correct. We'll need further unicode facilities in Phobos to deal with that
though, and I doubt that strings will ever change to be treated as ranges of
graphemes, since that would be incredibly expensive computationally. We have
enough performance problems with strings as it is. What we'll probably get is
extra functions to deal with normalization (and probably something to count
the number of graphemes) and probably a wrapper type that does deal in
graphemes.
Regardless, you're right about walkLength returning the number of code points
rather than graphemes, because strings are considered to be ranges of dchar.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list