VLERange: a range in between BidirectionalRange and RandomAccessRange
Ali Çehreli
acehreli at yahoo.com
Tue Jan 18 23:43:38 PST 2011
Michel Fortin wrote:
> On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin at michelf.com>
> said:
> So perhaps the best interface for strings would be to provide multiple
> range-like interfaces that you can use at the level you want.
That's what I've been thinking. The users can choose whether they want
random access or not. A grapheme-aware string can provide random access
at a space cost, or no random access for efficient space use.
I see 5 layers in string processing. Layers 1 and 2 are currently
handled by D, sometimes in an unclear way. e.g. char[] may be used as an
array of code units or an array of code points depending on the type of
iteration.
1) Code units: This is what D provides with its string types
This layers models RandomAccessRange
2) Code points: This is what D and Phobos provide for example with
foreach(d; stride(s, 1))
dchar[] models RandomAccessRange at this layer
char[] and wchar[] model ForwardRange at this layer
(If I understand it correctly, Steven Schveighoffer is trying to provide
a pseudo-RandomAccessRange to char[] and wchar[] with his string type.)
3) Graphemes: This is what the string type that spir is working on.
There could be at least two types:
3a) RandomAccessGraphemeRange: Has random access but the data type is large
3b) ForwardGraphemeRange: space-efficient but does not provide random access
I think the programmers would be happy to be able to choose.
4) Letters: Uses either 3a or 3b. This is the layer where the idea of a
writing system enters the picture: lower/upper case transformations and
sorting happen at this layer. (I have a library that tries to handle
this layer but is ignorant of graphemes; I am waiting for spir's string
type. ;))
4a) Models RandomAccessRange if based on a RandomAccessGraphemeRange
4b) Models ForwardRange if based on a ForwardGraphemeRange
5) Text: Collection of Letters. This is where a name like "ali & tim" is
correctly capitalized as "ALİ & TIM" because the text consists of two
separate writing systems. (The same library that I mentioned in 4 tries
to handle this layer as well.)
Ali
More information about the Digitalmars-d
mailing list