Notice/Warning on narrowStrings .length

Nick Sabalausky SeeWebsiteToContactMe at semitwist.com
Thu Apr 26 14:56:40 PDT 2012


"Jonathan M Davis" <jmdavisProg at gmx.com> wrote in message 
news:mailman.2166.1335463456.4860.digitalmars-d at puremagic.com...
> On Thursday, April 26, 2012 13:51:17 Nick Sabalausky wrote:
>> Also, keep in mind that (unless I'm mistaken) walkLength does *not* 
>> return
>> the number of "characters" (ie, graphemes), but merely the number of code
>> points - which is not the same thing (due to existence of the
>> [confusingly-named] "combining characters").
>
> You're not mistaken. Nothing in Phobos (save perhaps some of std.regex's
> internals) deals with graphemes. It all operates on code points, and 
> strings
> are considered to be ranges of code points, not graphemes. So, as far as
> ranges go, walkLength returns the actual length of the range. That's 
> _usually_
> the number of characters/graphemes as well, but it's certainly not 100%
> correct. We'll need further unicode facilities in Phobos to deal with that
> though, and I doubt that strings will ever change to be treated as ranges 
> of
> graphemes, since that would be incredibly expensive computationally. We 
> have
> enough performance problems with strings as it is. What we'll probably get 
> is
> extra functions to deal with normalization (and probably something to 
> count
> the number of graphemes) and probably a wrapper type that does deal in
> graphemes.
>

Yea, I'm not saying that walkLength should deal with graphemes. Just that if 
someone wants the number of "characters", then neither length *nor* 
walkLength are guaranteed to be correct.




More information about the Digitalmars-d mailing list