Notice/Warning on narrowStrings .length

Jonathan M Davis jmdavisProg at gmx.com
Thu Apr 26 11:02:58 PDT 2012


On Thursday, April 26, 2012 13:51:17 Nick Sabalausky wrote:
> Also, keep in mind that (unless I'm mistaken) walkLength does *not* return
> the number of "characters" (ie, graphemes), but merely the number of code
> points - which is not the same thing (due to existence of the
> [confusingly-named] "combining characters").

You're not mistaken. Nothing in Phobos (save perhaps some of std.regex's 
internals) deals with graphemes. It all operates on code points, and strings 
are considered to be ranges of code points, not graphemes. So, as far as 
ranges go, walkLength returns the actual length of the range. That's _usually_ 
the number of characters/graphemes as well, but it's certainly not 100% 
correct. We'll need further unicode facilities in Phobos to deal with that 
though, and I doubt that strings will ever change to be treated as ranges of 
graphemes, since that would be incredibly expensive computationally. We have 
enough performance problems with strings as it is. What we'll probably get is 
extra functions to deal with normalization (and probably something to count 
the number of graphemes) and probably a wrapper type that does deal in 
graphemes.

Regardless, you're right about walkLength returning the number of code points 
rather than graphemes, because strings are considered to be ranges of dchar.

- Jonathan M Davis


More information about the Digitalmars-d mailing list