VLERange: a range in between BidirectionalRange and RandomAccessRange

Thu Jan 13 12:51:00 PST 2011

On 1/13/11 11:35 AM, Steven Schveighoffer wrote:
> On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>> Let's take a look:
>>
>> // Incorrect string code
>> void fun(string s) {
>> foreach (i; 0 .. s.length) {
>> writeln("The character in position ", i, " is ", s[i]);
>> }
>> }
>>
>> // Incorrect string_t code
>> void fun(string_t!char s) {
>> foreach (i; 0 .. s.codeUnits) {
>> writeln("The character in position ", i, " is ", s[i]);
>> }
>> }
>>
>> Both functions are incorrect, albeit in different ways. The only
>> improvement I'm seeing is that the user needs to write codeUnits
>> instead of length, which may make her think twice. Clearly, however,
>> copiously incorrect code can be written with the proposed interface
>> because it tries to hide the reality that underneath a variable-length
>> encoding is being used, but doesn't hide it completely (albeit for
>> good efficiency-related reasons).
>
> You might be looking at my previous version. The new version (recently
> posted) will throw an exception for that code if a multi-code-unit
> code-point is found.

I was looking at your latest. It's code that compiles and runs, but 
dynamically fails on some inputs. I agree that it's often better to fail 
noisily instead of silently, but in a manner of speaking the 
string-based code doesn't fail at all - it correctly iterates the code 
units of a string. This may sometimes not be what the user expected; 
most of the time they'd care about the code points.

> It also supports this:
>
> foreach(i, d; s)
> {
> writeln("The character in position ", i, " is ", d);
> }
>
> where i is the index (might not be sequential)

Well string supports that too, albeit with the nit that you need to 
specify dchar.

>> But wait, there's less. Functions for random-access range throughout
>> Phobos routinely assume fixed-length encoding, i.e. s[i + 1] lies next
>> to s[i]. From a cursory look at string_t, std.range will qualify it as
>> a RandomAccessRange without length. That's an odd beast but does not
>> change the fixed-length encoding assumption. So you'd need to
>> special-case algorithms for string_t, just like right now certain
>> algorithms are specialized for string.
>
> isRandomAccessRange requires hasLength (see here:
> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L532).
> This is not a random access range per that definition.

That's an interesting twist. By the way I specified length is required 
then because I couldn't imagine having random access into something that 
I can't tell the length of. Apparently I was wrong :o).

> But a string
> isn't a random access range anyways (it's specifically disallowed by
> std.range per that same reference).

It isn't and it isn't supposed to be.

> The plan is you would *not* have to special case algorithms for string_t
> as you do currently for char[]. If that's not the case, then we haven't
> achieved much. Simply put, we are separating out the strange nature of
> strings from arrays, so the exceptional treatment of them is handled by
> the type itself, not the functions using it.

That sounds reasonable.

Andrei