VLERange: a range in between BidirectionalRange and RandomAccessRange

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Tue Jan 11 15:00:30 PST 2011


On 1/11/11 11:21 AM, Steven Schveighoffer wrote:
> On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 1/11/11 5:30 AM, Steven Schveighoffer wrote:
>>> While this makes it possible to write algorithms that only accept
>>> VLERanges, I don't think it solves the major problem with strings --
>>> they are treated as arrays by the compiler.
>>
>> Except when they're not - foreach with dchar...
>
> This solitary difference is a very thin argument -- foreach(d;
> byDchar(str)) would be just as good without requiring compiler help.
>
>>
>>> I'd also rather see an indexing operation return the element type, and
>>> have a separate function to get the encoding unit. This makes more sense
>>> for generic code IMO.
>>
>> But that's neither here nor there. That would return the logical
>> element at a physical position. I am very doubtful that much generic
>> code could work without knowing they are in fact dealing with a
>> variable-length encoding.
>
> It depends on the function, and the way the indexing is implemented.
>
>>> I noticed you never commented on my proposed string type...
>>>
>>> That reminds me, I should update with suggested changes and re-post it.
>>
>> To be frank, I think it didn't mark a visible improvement. It solved
>> some problems and brought others. There was disagreement over the
>> offered primitives and their semantics.
>
> It is supposed to be simple, and provide the expected interface, without
> causing any undue performance degradation. That is, I should be able to
> do all the things with a replacement string type that I can with a char
> array today, as efficiently as I can today, except I should have to work
> to get at the code-units. The huge benefit is that I can say "I'm
> dealing with this as an array" when I know it's safe

Unfinished sentence? Anyway, for my money you just described what we 
have now.

> The disagreement will never be fully solved, as there is just as much
> disagreement about the current state of affairs ;) e.g. should foreach
> default to using dchar?

I disagree about the disagreement being unsolvable. I'm not rigid; if I 
saw a terrific abstraction in your string, I'd be all for it. It just 
shuffles some issues about, and although I agree it does one thing or 
two better than char[], at the end of the day it doesn't carry its weight.

>> That being said, it's good you are doing this work. In the best case,
>> you could bring a compelling abstraction to the table. In the worst,
>> you'll become as happy about D's strings as I am :o).
>
> I don't think I'll ever be 'happy' with the way strings sit in phobos
> currently. I typically deal in ASCII (i.e. code units), and phobos works
> very hard to prevent that.

I wonder if we could and should extend some of the functions in 
std.string to work with ubyte[]. I did add a function called 
representation() that I didn't document yet. Essentially representation 
gives you the ubyte[], ushort[], or uint[] underneath a string, with the 
same qualifiers. Whenever you want an algorithm to work on ASCII in 
earnest, you can pass representation(s) to it instead of s.

If you work a lot with ASCII, an AsciiString abstraction may be a better 
and more likely to be successful string type. Better yet, you could 
simply focus on AsciiChar and then define ASCII strings as arrays of 
AsciiChar.


Andrei


More information about the Digitalmars-d mailing list