std.algorithm.remove and principle of least astonishment

Steven Schveighoffer schveiguy at yahoo.com
Sat Oct 16 13:10:29 PDT 2010


On Sat, 16 Oct 2010 15:49:56 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 10/16/2010 01:39 PM, Steven Schveighoffer wrote:
>> Andrei, I am increasingly seeing people struggling with the decision to
>> make strings bidirectional ranges of dchar instead of what the compiler
>> says they are. This needs a different solution. It's too
>> confusing/difficult to deal with.
>
> I'm not seeing that. I'm seeing strings working automagically with most  
> of std.algorithm without ever destroying a wide string.

I've seen several posts regarding char[] being considered differently by  
the compiler and std.algorithm.

The most prominent was the fact that:

foreach(x; str)

iterates over individual char's, not dchars.

While I agree that a bidirectional range is the only sane way to view  
utf-8 strings, a char[] is not necessarily a utf-8 string.  It's an array  
of utf-8 code points.  At least to the compiler.

You can interpret it as a utf-8 string, or as an array.  And the compiler  
allows both.  std.algorithm doesn't.  This half-ass attempt to make  
strings safe just fosters confusion.

My suggestion is to make a range that enforces the correct restrictions on  
strings.  The compiler should treat string literals as a polysemous type  
that is by default this new type, or could optionally be an array of  
immutable characters.

So for example if you define:

struct string(T) if (is(T == char) || is(T == wchar))
{
    private immutable(T)[] data;
    // range functions to ensure data is only accessed via dchar
    ...
}

Which then is used by the compiler to represent string literals, then we  
have control over what a string literal allows without littering  
std.algorithm with special cases (and any external algorithms that might  
encounter strings).

So for example, I'd want something like this:

immutable(char)[] asciiarr = "abcdef";
auto str = "abcdef"; // typed as string

foreach(x; str)
{
    assert(is(typeof(x) == dchar));
}

foreach(ref x; str) // fails

foreach(ref x; asciiarr) // ok, x is of type immutable(char)

The truth is, 100% of the time for me, I want to use string literals to  
represent ASCII strings, not utf-8 strings (I speak English, so I care  
almost nothing for unicode).  And std.algorithm steadfastly refuses to  
treat them as such.  I think it's just too limited.  Yes, it would be nice  
if by default strings were bi-directional ranges of dchar, to be on the  
safe side, but I also want the ability to have an array of chars, which  
works as an array, even in std.algorithm, *and* is initializeable via  
string literals.


My requirements for the string struct would be:

1. only access via dchar
2. prevent slicing a code point
3. Indexing returns a dchar as well, which provides pseudo-random access  
(if you access an index that's in the middle of a code point, you get an  
exception).

-Steve


More information about the Digitalmars-d mailing list