VLERange: a range in between BidirectionalRange and RandomAccessRange

Mon Jan 17 19:48:59 PST 2011

On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin at michelf.com> said:

> More seriously, you have four choice:
> 
> 1. code unit
> 2. code point
> 3. grapheme
> 4. require the client to state explicitly which kind of 'character' he 
> wants; 'character' being an overloaded word, it's reasonable to ask for 
> disambiguation.

This makes me think of what I did with my XML parser after you made 
code points the element type for strings. Basically, the parser now 
uses 'front' and 'popFront' whenever it needs to get the next code 
point, but most of the time it uses 'frontUnit' and 'popFrontUnit' 
instead (which I had to add) when testing for or skipping an ASCII 
character is sufficient. This way I avoid a lot of unnecessary decoding 
of code points.

For this to work, the same range must let you skip either a unit or a 
code point. If I were using a separate range with a call to toDchar or 
toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't 
have helped much because the new range would essentially become a new 
slice independent of the original, so you can't interleave "I want to 
advance by one unit" with "I want to advance by one code point".

So perhaps the best interface for strings would be to provide multiple 
range-like interfaces that you can use at the level you want.

I'm not sure if this is a good idea, but I thought I should at least 
share my experience.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/