std.algorithm.remove and principle of least astonishment

Sun Nov 21 23:08:07 PST 2010

On 11/21/10 11:59 PM, Rainer Deyke wrote:
> On 11/21/2010 21:56, Andrei Alexandrescu wrote:
>> On 11/21/10 22:09 CST, Rainer Deyke wrote:
>>> On 11/21/2010 17:31, Andrei Alexandrescu wrote:
>>> char[] and wchar[] fail to provide some of the guarantees of all other
>>> instances of T[].
>>
>> What exactly are those guarantees?
>
> That the range view and the array view provide direct access to the same
> data.

Where do ranges state that assumption?

> One of the useful features of most arrays is that an array of T can be
> treated as a range of T.  However, this feature is missing for arrays of
> char and wchar.

This is not a guarantee by ranges, it's just a mistaken assumption.

>>>     - When writing code that uses T[], it is often natural to mix
>>> range-based access and index-based access, with the assumption that both
>>> provide direct access to the same underlying data.  However, with char[]
>>> this assumption is incorrect, as the underlying data is transformed when
>>> viewing the array as a range.  This means that generic code that uses
>>> T[] must take special consideration of char[] or it may unexpectedly
>>> produce incorrect results when T = char.
>>
>> What you're saying is that you write generic code that requires T[], and
>> then the code itself uses front, popFront, and other range-specific
>> functions in conjunction with it.
>
> No, I'm saying that I write generic code that declares T[] and then
> passes it off to a function that operates on ranges, or to a foreach loop.

A function that operates on ranges would have an appropriate constraint 
so it would work properly or not at all. foreach works fine with all arrays.

>> But this is exactly the problem. If you want to use range primitives,
>> you submit to the requirement of ranges. So you write the generic
>> function to ask for ranges (with e.g. isForwardRange etc). Otherwise
>> your code is incorrect.
>
> Again, my generic function declares the array as a local variable or a
> member variable.  It cannot declare a generic range.
>
>> If you want to work with arrays, use a[0] to access the front, a[$ - 1]
>> to access the back, and a = a[1 .. $] to chop off the first element of
>> the array. It is not AT ALL natural to mix those with a.front, a.back
>> etc. It is not - why? because std.range defines them with specific
>> meanings for arrays in general and for arrays of characters in
>> particular. If you submit to use std.range's abstraction, you submit to
>> using it the way it is defined.
>
> It absolutely is natural to mix these in code that is written without
> consideration for strings, especially when you consider that foreach
> also uses the range interface.
>
> Let's say I have an array and I want to iterate over the first ten
> items.  My first instinct would be to write something like this:
>
>    foreach (item; array[0 .. 10]) {
>      doSomethingWith(item);
>    }
>
> Simple, natural, readable code.  Broken for arrays of char or wchar, but
> in a way that is difficult to detect.

Why is it broken? Please try it to convince yourself of the contrary.

>> So: if you want to use char[] as an array with the built-in array
>> interface, no problem. If you want to use char[] as a range with the
>> range interface as defined by std.range, again no problem. But asking
>> for one and then surreptitiously using the other is simply incorrect
>> code. You can't use std.range while at the same time complaining you
>> can't be bothered to read its docs.
>
> This would sound reasonable if I were using char[] directly.  I'm not.
> I'm using T[] in a generic context.  I may not have considered the case
> of T = char when I wrote the code.  The code may even have originally
> used Widget[] before I decided to make it generic.

Fine. Use T[] generically in conjunction with the array primitives. If 
you plan to use them with the range primitives, you do as ranges do.

>> I challenge you to define an alternative built-in string that fares
>> better than string&  Comp. Before long you'll be overwhelmed by the
>> various necessities imposed by your constraints.
>
> Easy:
>    - string_t becomes a keyword.
>    - Syntactically speaking, string_t!T is the name of a type when T is a
> type.
>    - For every built-in character type T (including const and immutable
> versions), the type currently called T[] is now called string_t!T, but
> otherwise maintains all of its current behavior.
>    - For every other type T, string_t!T is an error.
>    - char[] and wchar[] (including const and immutable versions) are
> plain arrays of code units, even when viewed as a range.
>
> It's not my preferred solution, but it's easy to explain, it fixes the
> main problem with the current system, and it only costs one keyword.
>
> (I'd rather treat string_t as a library template with compiler support
> like and rename it to String, but then it wouldn't be a built-in string.)

I very much prefer the current state of affairs.

Andrei