VLERange: a range in between BidirectionalRange and RandomAccessRange

Tomek Sowiński just at ask.me
Tue Jan 11 14:49:37 PST 2011


Andrei Alexandrescu napisał:

> I've been thinking on how to better deal with Unicode strings. Currently 
> strings are formally bidirectional ranges with a surreptitious random 
> access interface. The random access interface accesses the support of 
> the string, which is understood to hold data in a variable-encoded 
> format. For as long as the programmer understands this relationship, 
> code for string manipulation can be written with relative ease. However, 
> there is still room for writing wrong code that looks legit.
> 
> Sometimes the best way to tackle a hairy reality is to invite it to the 
> negotiation table and offer it promotion to first-class abstraction 
> status. Along that vein I was thinking of defining a new range: 
> VLERange, i.e. Variable Length Encoding Range. Such a range would have 
> the power somewhere in between bidirectional and random access.
> 
> The primitives offered would include empty, access to front and back, 
> popFront and popBack (just like BidirectionalRange), and in addition 
> properties typical of random access ranges: indexing, slicing, and 
> length.

For some compressions implementing *back is troublesome if not impossible...

> Note that the result of the indexing operator is not the same as 
> the element type of the range, as it only represents the unit of encoding.

It's worth to mention it explicitly -- a VLERange is dually typed. It's important for searching. Statically check if original and encoded match, if so, perform fast search on directly on encoded elements. I think an important feature of a VLERange should be dropping  itself down to a encoded-typed range, so that front and back return raw data.

Dual typing will also affect foreach -- in general case you'd want to choose whether to decode or not by typing the element.

I can't stop thinking that VLERange is a two-piece bikini making a bare random-access range safe to look at, and that you can take off when partners have confidence, not a limited random-access probing facility to span the void between front and back.

> In addition to these (and connecting the two), a VLERange would offer 
> two additional primitives:
> 
> 1. size_t stepSize(size_t offset) gives the length of the step needed to 
> skip to the next element.
> 
> 2. size_t backstepSize(size_t offset) gives the size of the _backward_ 
> step that goes to the previous element.
> 
> In both cases, offset is assumed to be at the beginning of a logical 
> element of the range.

So when I move the spinner in an iPod, I get catapulted in position with the raw data opIndex and from there I try to work my way to the next frame to start playback. Sounds promising.

> I suspect that a lot of functions in std.string can be written without 
> Unicode-specific knowledge just by relying on such an interface. 
> Moreover, algorithms can be generalized to other structures that use 
> variable-length encoding, such as those used in data compression. (In 
> that case, the support would be a bit array and the encoded type would 
> be ubyte.)

I agree, acknowledging encoding/compression as a general direction will bring substantial benefits.

> Writing to such ranges is not addressed by this design. Ideas are welcome.

Yeah, we can address outputting later, that's fair.

> Adding VLERange would legitimize strings and would clarify their 
> handling, at the cost of adding one additional concept that needs to be 
> minded. Is the trade-off worthwhile?

Well, the only way to find out is try it. My advice: VLERanges originated as a solution to the string problem, so start with a non-string incarnation. Having at least two (one, we know, is string) plugs that fit the same socket will spur confidence in the abstraction. 

-- 
Tomek



More information about the Digitalmars-d mailing list