VLERange: a range in between BidirectionalRange and RandomAccessRange
Steven Schveighoffer
schveiguy at yahoo.com
Thu Jan 13 14:01:35 PST 2011
On Thu, 13 Jan 2011 15:51:00 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> On 1/13/11 11:35 AM, Steven Schveighoffer wrote:
>> On Thu, 13 Jan 2011 14:08:36 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>> Let's take a look:
>>>
>>> // Incorrect string code
>>> void fun(string s) {
>>> foreach (i; 0 .. s.length) {
>>> writeln("The character in position ", i, " is ", s[i]);
>>> }
>>> }
>>>
>>> // Incorrect string_t code
>>> void fun(string_t!char s) {
>>> foreach (i; 0 .. s.codeUnits) {
>>> writeln("The character in position ", i, " is ", s[i]);
>>> }
>>> }
>>>
>>> Both functions are incorrect, albeit in different ways. The only
>>> improvement I'm seeing is that the user needs to write codeUnits
>>> instead of length, which may make her think twice. Clearly, however,
>>> copiously incorrect code can be written with the proposed interface
>>> because it tries to hide the reality that underneath a variable-length
>>> encoding is being used, but doesn't hide it completely (albeit for
>>> good efficiency-related reasons).
>>
>> You might be looking at my previous version. The new version (recently
>> posted) will throw an exception for that code if a multi-code-unit
>> code-point is found.
>
> I was looking at your latest. It's code that compiles and runs, but
> dynamically fails on some inputs. I agree that it's often better to fail
> noisily instead of silently, but in a manner of speaking the
> string-based code doesn't fail at all - it correctly iterates the code
> units of a string. This may sometimes not be what the user expected;
> most of the time they'd care about the code points.
iterating the code units is possible by accessing the array data. i.e.
you could do:
foreach(i, c; s.data)
if you want the code-units.
That is the point of having a separate type. Using string_t tells the
library "I'm using this data as a string". Using char[] tells the library
"I'm using this data as an array."
The difference here is, you have to *specifically* try to access the code
units, the default is code-points. All it does really is switch the
default.
>> It also supports this:
>>
>> foreach(i, d; s)
>> {
>> writeln("The character in position ", i, " is ", d);
>> }
>>
>> where i is the index (might not be sequential)
>
> Well string supports that too, albeit with the nit that you need to
> specify dchar.
This is not a small problem.
>> isRandomAccessRange requires hasLength (see here:
>> http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d#L532).
>> This is not a random access range per that definition.
>
> That's an interesting twist. By the way I specified length is required
> then because I couldn't imagine having random access into something that
> I can't tell the length of. Apparently I was wrong :o).
Yes, in fact, you could say that specifically defines VLERange ;) But
actually, there are two types of VLE ranges, those which can be randomly
accessed (where determining the beginning of a code point, given a random
index is possible) and those that cannot (where decoding depends on the
exact order of the data). Actually, those would not be bi-directional
ranges anyways.
>> But a string
>> isn't a random access range anyways (it's specifically disallowed by
>> std.range per that same reference).
>
> It isn't and it isn't supposed to be.
I agree with that assessment, which is why I omitted length.
-Steve
More information about the Digitalmars-d
mailing list