Narrow string is not a random access range

Jonathan M Davis jmdavisProg at gmx.com
Wed Oct 24 14:58:11 PDT 2012


On Wednesday, October 24, 2012 12:53:23 H. S. Teoh wrote:
> For many algorithms, full decode is not necessary. This is something
> that Phobos should take advantage of (at least in theory; I'm not sure
> how practical this is with the current codebase).

It does take advantage of it in a number of cases but not necessarily 
everywhere that it could. That's actually one major issue with ranges though 
is that if you've wrapped a string in a range at all (via map, filter, take, or 
whatever), then the resultant range is forced to decode on every call to front 
or popFront (well, partial decode on popFront anyway), whereas functions can 
special case strings to avoid extraneous decoding with them. So, you can take 
a performance hit if you're operating on wrapped strings rather than on 
strings directly.

> Actually, in the above case, *no* decode is necessary at all. UTF-8 was
> designed specifically for this: if you see a byte with its highest bits
> set to 0b10, that means you're in the middle of a code point. You can
> scan forwards or backwards until the first byte whose highest bits
> aren't 0b10; that's guaranteed to be the start of a code point (provided
> the original string is actually well-formed UTF-8). There is no need to
> keep track of length at all.

I wouldn't say that "no" decoding is necessary. Rather, I'd say that partial 
decoding is necessary. If you have to examine the code units to determine 
where code points are or how long they are or whatnot, then you're still doing 
part of what decode has to do, whereas a function like find can forgo checking 
any of that entirely and merely compare the values of the code units. _That_'s 
what I'd consider to be no decoding required, and commonPrefix is buggy 
precisely because it's doing no decoding rather than partial decoding. But I 
suppose that it's arguing semantics.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list