VLERange: a range in between BidirectionalRange and RandomAccessRange

Tue Jan 11 15:00:30 PST 2011

On 1/11/11 11:21 AM, Steven Schveighoffer wrote:
> On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 1/11/11 5:30 AM, Steven Schveighoffer wrote:
>>> While this makes it possible to write algorithms that only accept
>>> VLERanges, I don't think it solves the major problem with strings --
>>> they are treated as arrays by the compiler.
>>
>> Except when they're not - foreach with dchar...
>
> This solitary difference is a very thin argument -- foreach(d;
> byDchar(str)) would be just as good without requiring compiler help.
>
>>
>>> I'd also rather see an indexing operation return the element type, and
>>> have a separate function to get the encoding unit. This makes more sense
>>> for generic code IMO.
>>
>> But that's neither here nor there. That would return the logical
>> element at a physical position. I am very doubtful that much generic
>> code could work without knowing they are in fact dealing with a
>> variable-length encoding.
>
> It depends on the function, and the way the indexing is implemented.
>
>>> I noticed you never commented on my proposed string type...
>>>
>>> That reminds me, I should update with suggested changes and re-post it.
>>
>> To be frank, I think it didn't mark a visible improvement. It solved
>> some problems and brought others. There was disagreement over the
>> offered primitives and their semantics.
>
> It is supposed to be simple, and provide the expected interface, without
> causing any undue performance degradation. That is, I should be able to
> do all the things with a replacement string type that I can with a char
> array today, as efficiently as I can today, except I should have to work
> to get at the code-units. The huge benefit is that I can say "I'm
> dealing with this as an array" when I know it's safe

Unfinished sentence? Anyway, for my money you just described what we 
have now.

> The disagreement will never be fully solved, as there is just as much
> disagreement about the current state of affairs ;) e.g. should foreach
> default to using dchar?

I disagree about the disagreement being unsolvable. I'm not rigid; if I 
saw a terrific abstraction in your string, I'd be all for it. It just 
shuffles some issues about, and although I agree it does one thing or 
two better than char[], at the end of the day it doesn't carry its weight.

>> That being said, it's good you are doing this work. In the best case,
>> you could bring a compelling abstraction to the table. In the worst,
>> you'll become as happy about D's strings as I am :o).
>
> I don't think I'll ever be 'happy' with the way strings sit in phobos
> currently. I typically deal in ASCII (i.e. code units), and phobos works
> very hard to prevent that.

I wonder if we could and should extend some of the functions in 
std.string to work with ubyte[]. I did add a function called 
representation() that I didn't document yet. Essentially representation 
gives you the ubyte[], ushort[], or uint[] underneath a string, with the 
same qualifiers. Whenever you want an algorithm to work on ASCII in 
earnest, you can pass representation(s) to it instead of s.

If you work a lot with ASCII, an AsciiString abstraction may be a better 
and more likely to be successful string type. Better yet, you could 
simply focus on AsciiChar and then define ASCII strings as arrays of 
AsciiChar.

Andrei