VLERange: a range in between BidirectionalRange and RandomAccessRange

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Jan 10 19:57:36 PST 2011


I've been thinking on how to better deal with Unicode strings. Currently 
strings are formally bidirectional ranges with a surreptitious random 
access interface. The random access interface accesses the support of 
the string, which is understood to hold data in a variable-encoded 
format. For as long as the programmer understands this relationship, 
code for string manipulation can be written with relative ease. However, 
there is still room for writing wrong code that looks legit.

Sometimes the best way to tackle a hairy reality is to invite it to the 
negotiation table and offer it promotion to first-class abstraction 
status. Along that vein I was thinking of defining a new range: 
VLERange, i.e. Variable Length Encoding Range. Such a range would have 
the power somewhere in between bidirectional and random access.

The primitives offered would include empty, access to front and back, 
popFront and popBack (just like BidirectionalRange), and in addition 
properties typical of random access ranges: indexing, slicing, and 
length. Note that the result of the indexing operator is not the same as 
the element type of the range, as it only represents the unit of encoding.

In addition to these (and connecting the two), a VLERange would offer 
two additional primitives:

1. size_t stepSize(size_t offset) gives the length of the step needed to 
skip to the next element.

2. size_t backstepSize(size_t offset) gives the size of the _backward_ 
step that goes to the previous element.

In both cases, offset is assumed to be at the beginning of a logical 
element of the range.

I suspect that a lot of functions in std.string can be written without 
Unicode-specific knowledge just by relying on such an interface. 
Moreover, algorithms can be generalized to other structures that use 
variable-length encoding, such as those used in data compression. (In 
that case, the support would be a bit array and the encoded type would 
be ubyte.)

Writing to such ranges is not addressed by this design. Ideas are welcome.

Adding VLERange would legitimize strings and would clarify their 
handling, at the cost of adding one additional concept that needs to be 
minded. Is the trade-off worthwhile?


Andrei


More information about the Digitalmars-d mailing list