std.algorithm.remove and principle of least astonishment
Steven Schveighoffer
schveiguy at yahoo.com
Sat Oct 16 13:10:29 PDT 2010
On Sat, 16 Oct 2010 15:49:56 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> On 10/16/2010 01:39 PM, Steven Schveighoffer wrote:
>> Andrei, I am increasingly seeing people struggling with the decision to
>> make strings bidirectional ranges of dchar instead of what the compiler
>> says they are. This needs a different solution. It's too
>> confusing/difficult to deal with.
>
> I'm not seeing that. I'm seeing strings working automagically with most
> of std.algorithm without ever destroying a wide string.
I've seen several posts regarding char[] being considered differently by
the compiler and std.algorithm.
The most prominent was the fact that:
foreach(x; str)
iterates over individual char's, not dchars.
While I agree that a bidirectional range is the only sane way to view
utf-8 strings, a char[] is not necessarily a utf-8 string. It's an array
of utf-8 code points. At least to the compiler.
You can interpret it as a utf-8 string, or as an array. And the compiler
allows both. std.algorithm doesn't. This half-ass attempt to make
strings safe just fosters confusion.
My suggestion is to make a range that enforces the correct restrictions on
strings. The compiler should treat string literals as a polysemous type
that is by default this new type, or could optionally be an array of
immutable characters.
So for example if you define:
struct string(T) if (is(T == char) || is(T == wchar))
{
private immutable(T)[] data;
// range functions to ensure data is only accessed via dchar
...
}
Which then is used by the compiler to represent string literals, then we
have control over what a string literal allows without littering
std.algorithm with special cases (and any external algorithms that might
encounter strings).
So for example, I'd want something like this:
immutable(char)[] asciiarr = "abcdef";
auto str = "abcdef"; // typed as string
foreach(x; str)
{
assert(is(typeof(x) == dchar));
}
foreach(ref x; str) // fails
foreach(ref x; asciiarr) // ok, x is of type immutable(char)
The truth is, 100% of the time for me, I want to use string literals to
represent ASCII strings, not utf-8 strings (I speak English, so I care
almost nothing for unicode). And std.algorithm steadfastly refuses to
treat them as such. I think it's just too limited. Yes, it would be nice
if by default strings were bi-directional ranges of dchar, to be on the
safe side, but I also want the ability to have an array of chars, which
works as an array, even in std.algorithm, *and* is initializeable via
string literals.
My requirements for the string struct would be:
1. only access via dchar
2. prevent slicing a code point
3. Indexing returns a dchar as well, which provides pseudo-random access
(if you access an index that's in the middle of a code point, you get an
exception).
-Steve
More information about the Digitalmars-d
mailing list