std.algorithm.remove and principle of least astonishment

Mon Nov 22 02:01:38 PST 2010

On 11/22/2010 00:08, Andrei Alexandrescu wrote:
> On 11/21/10 11:59 PM, Rainer Deyke wrote:
>> That the range view and the array view provide direct access to the same
>> data.
> 
> Where do ranges state that assumption?

Are you saying that arrays of T do not function as ranges of T when T is
not a character type?

>> One of the useful features of most arrays is that an array of T can be
>> treated as a range of T.  However, this feature is missing for arrays of
>> char and wchar.
> 
> This is not a guarantee by ranges, it's just a mistaken assumption.

I'm not saying that this feature is guaranteed for all arrays, because
it clearly isn't.  I'm saying that this feature is present for T[] where
T is not a character type, and missing for T[] where T is a character
type.  When writing code that is not intended to operate on character
data, it is natural to use this feature.  The code then breaks when the
code is used with character data.

>> No, I'm saying that I write generic code that declares T[] and then
>> passes it off to a function that operates on ranges, or to a foreach
>> loop.
> 
> A function that operates on ranges would have an appropriate constraint
> so it would work properly or not at all. foreach works fine with all
> arrays.

It "works", but produces different results than when iterating over a
character array than when iterating over a non-character array.  Code
can compile, have well-defined behavior, run, produce correct results in
most cases, but still be wrong.

>> Let's say I have an array and I want to iterate over the first ten
>> items.  My first instinct would be to write something like this:
>>
>>    foreach (item; array[0 .. 10]) {
>>      doSomethingWith(item);
>>    }
>>
>> Simple, natural, readable code.  Broken for arrays of char or wchar, but
>> in a way that is difficult to detect.
> 
> Why is it broken? Please try it to convince yourself of the contrary.

I see, foreach still iterates over code units by default.  Of course,
this means that foreach over ranges doesn't work with strings, which in
turn means that algorithms that use foreach over ranges are broken.
Observe:

  import std.stdio;
  import std.algorithm;

  void main() {
    writeln(count!("true")("日本語")); // Three characters.
  }

Output (compiled with Digital Marse D Compiler v2.050):
  9

> Fine. Use T[] generically in conjunction with the array primitives. If
> you plan to use them with the range primitives, you do as ranges do.

If arrays can't operate as ranges, what's the point of giving them a
range interface?

>> Easy:
>>    - string_t becomes a keyword.
>>    - Syntactically speaking, string_t!T is the name of a type when T is a
>> type.
>>    - For every built-in character type T (including const and immutable
>> versions), the type currently called T[] is now called string_t!T, but
>> otherwise maintains all of its current behavior.
>>    - For every other type T, string_t!T is an error.
>>    - char[] and wchar[] (including const and immutable versions) are
>> plain arrays of code units, even when viewed as a range.
>>
>> It's not my preferred solution, but it's easy to explain, it fixes the
>> main problem with the current system, and it only costs one keyword.
>>
>> (I'd rather treat string_t as a library template with compiler support
>> like and rename it to String, but then it wouldn't be a built-in string.)
> 
> I very much prefer the current state of affairs.

Care to support that with some arguments, or is it just a purely
subjective preference?

-- 
Rainer Deyke - rainerd at eldwood.com