VLERange: a range in between BidirectionalRange and RandomAccessRange

Steven Schveighoffer schveiguy at yahoo.com
Thu Jan 13 11:35:44 PST 2011


On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 1/13/11 8:52 AM, Steven Schveighoffer wrote:
>> I see it as having two vast improvements:
>>
>> 1. If we replace char[] with a specific type for string, then char[] can
>> be considered a true array by phobos, and phobos can now deal with a
>> char[] array without the need to cast.
>> 2. It protects the casual user from incorrectly using a string by making
>> the default the correct API.
>>
>> Those to me are very important.
>
> Let's take a look:
>
> // Incorrect string code
> void fun(string s) {
>    foreach (i; 0 .. s.length) {
>      writeln("The character in position ", i, " is ", s[i]);
>    }
> }
>
> // Incorrect string_t code
> void fun(string_t!char s) {
>    foreach (i; 0 .. s.codeUnits) {
>      writeln("The character in position ", i, " is ", s[i]);
>    }
> }
>
> Both functions are incorrect, albeit in different ways. The only  
> improvement I'm seeing is that the user needs to write codeUnits instead  
> of length, which may make her think twice. Clearly, however, copiously  
> incorrect code can be written with the proposed interface because it  
> tries to hide the reality that underneath a variable-length encoding is  
> being used, but doesn't hide it completely (albeit for good  
> efficiency-related reasons).

You might be looking at my previous version.  The new version (recently  
posted) will throw an exception for that code if a multi-code-unit  
code-point is found.

It also supports this:

foreach(i, d; s)
{
    writeln("The character in position ", i, " is ", d);
}

where i is the index (might not be sequential)

> But wait, there's less. Functions for random-access range throughout  
> Phobos routinely assume fixed-length encoding, i.e. s[i + 1] lies next  
> to s[i]. From a cursory look at string_t, std.range will qualify it as a  
> RandomAccessRange without length. That's an odd beast but does not  
> change the fixed-length encoding assumption. So you'd need to  
> special-case algorithms for string_t, just like right now certain  
> algorithms are specialized for string.

isRandomAccessRange requires hasLength (see here:  
http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L532).   
This is not a random access range per that definition.  But a string isn't  
a random access range anyways (it's specifically disallowed by std.range  
per that same reference).

The plan is you would *not* have to special case algorithms for string_t  
as you do currently for char[].  If that's not the case, then we haven't  
achieved much.  Simply put, we are separating out the strange nature of  
strings from arrays, so the exceptional treatment of them is handled by  
the type itself, not the functions using it.

-Steve


More information about the Digitalmars-d mailing list