VLERange: a range in between BidirectionalRange and RandomAccessRange

Tue Jan 18 23:43:38 PST 2011

Michel Fortin wrote:
 > On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin at michelf.com>
 > said:

 > So perhaps the best interface for strings would be to provide multiple
 > range-like interfaces that you can use at the level you want.

That's what I've been thinking. The users can choose whether they want 
random access or not. A grapheme-aware string can provide random access 
at a space cost, or no random access for efficient space use.

I see 5 layers in string processing. Layers 1 and 2 are currently 
handled by D, sometimes in an unclear way. e.g. char[] may be used as an 
array of code units or an array of code points depending on the type of 
iteration.

1) Code units: This is what D provides with its string types

This layers models RandomAccessRange

2) Code points: This is what D and Phobos provide for example with 
foreach(d; stride(s, 1))

dchar[] models RandomAccessRange at this layer

char[] and wchar[] model ForwardRange at this layer

(If I understand it correctly, Steven Schveighoffer is trying to provide 
a pseudo-RandomAccessRange to char[] and wchar[] with his string type.)

3) Graphemes: This is what the string type that spir is working on. 
There could be at least two types:

3a) RandomAccessGraphemeRange: Has random access but the data type is large

3b) ForwardGraphemeRange: space-efficient but does not provide random access

I think the programmers would be happy to be able to choose.

4) Letters: Uses either 3a or 3b. This is the layer where the idea of a 
writing system enters the picture: lower/upper case transformations and 
sorting happen at this layer. (I have a library that tries to handle 
this layer but is ignorant of graphemes; I am waiting for spir's string 
type. ;))

4a) Models RandomAccessRange if based on a RandomAccessGraphemeRange

4b) Models ForwardRange if based on a ForwardGraphemeRange

5) Text: Collection of Letters. This is where a name like "ali & tim" is 
correctly capitalized as "ALİ & TIM" because the text consists of two 
separate writing systems. (The same library that I mentioned in 4 tries 
to handle this layer as well.)

Ali