std.algorithm.remove and principle of least astonishment

Rainer Deyke rainerd at eldwood.com
Sun Nov 21 21:59:44 PST 2010


On 11/21/2010 21:56, Andrei Alexandrescu wrote:
> On 11/21/10 22:09 CST, Rainer Deyke wrote:
>> On 11/21/2010 17:31, Andrei Alexandrescu wrote:
>> char[] and wchar[] fail to provide some of the guarantees of all other
>> instances of T[].
> 
> What exactly are those guarantees?

That the range view and the array view provide direct access to the same
data.

One of the useful features of most arrays is that an array of T can be
treated as a range of T.  However, this feature is missing for arrays of
char and wchar.

>>    - When writing code that uses T[], it is often natural to mix
>> range-based access and index-based access, with the assumption that both
>> provide direct access to the same underlying data.  However, with char[]
>> this assumption is incorrect, as the underlying data is transformed when
>> viewing the array as a range.  This means that generic code that uses
>> T[] must take special consideration of char[] or it may unexpectedly
>> produce incorrect results when T = char.
> 
> What you're saying is that you write generic code that requires T[], and
> then the code itself uses front, popFront, and other range-specific
> functions in conjunction with it.

No, I'm saying that I write generic code that declares T[] and then
passes it off to a function that operates on ranges, or to a foreach loop.

> But this is exactly the problem. If you want to use range primitives,
> you submit to the requirement of ranges. So you write the generic
> function to ask for ranges (with e.g. isForwardRange etc). Otherwise
> your code is incorrect.

Again, my generic function declares the array as a local variable or a
member variable.  It cannot declare a generic range.

> If you want to work with arrays, use a[0] to access the front, a[$ - 1]
> to access the back, and a = a[1 .. $] to chop off the first element of
> the array. It is not AT ALL natural to mix those with a.front, a.back
> etc. It is not - why? because std.range defines them with specific
> meanings for arrays in general and for arrays of characters in
> particular. If you submit to use std.range's abstraction, you submit to
> using it the way it is defined.

It absolutely is natural to mix these in code that is written without
consideration for strings, especially when you consider that foreach
also uses the range interface.

Let's say I have an array and I want to iterate over the first ten
items.  My first instinct would be to write something like this:

  foreach (item; array[0 .. 10]) {
    doSomethingWith(item);
  }

Simple, natural, readable code.  Broken for arrays of char or wchar, but
in a way that is difficult to detect.

> So: if you want to use char[] as an array with the built-in array
> interface, no problem. If you want to use char[] as a range with the
> range interface as defined by std.range, again no problem. But asking
> for one and then surreptitiously using the other is simply incorrect
> code. You can't use std.range while at the same time complaining you
> can't be bothered to read its docs.

This would sound reasonable if I were using char[] directly.  I'm not.
I'm using T[] in a generic context.  I may not have considered the case
of T = char when I wrote the code.  The code may even have originally
used Widget[] before I decided to make it generic.

> I challenge you to define an alternative built-in string that fares
> better than string & Comp. Before long you'll be overwhelmed by the
> various necessities imposed by your constraints.

Easy:
  - string_t becomes a keyword.
  - Syntactically speaking, string_t!T is the name of a type when T is a
type.
  - For every built-in character type T (including const and immutable
versions), the type currently called T[] is now called string_t!T, but
otherwise maintains all of its current behavior.
  - For every other type T, string_t!T is an error.
  - char[] and wchar[] (including const and immutable versions) are
plain arrays of code units, even when viewed as a range.

It's not my preferred solution, but it's easy to explain, it fixes the
main problem with the current system, and it only costs one keyword.

(I'd rather treat string_t as a library template with compiler support
like and rename it to String, but then it wouldn't be a built-in string.)


-- 
Rainer Deyke - rainerd at eldwood.com


More information about the Digitalmars-d mailing list