std.algorithm.remove and principle of least astonishment

Wed Nov 24 06:54:02 PST 2010

On 24/11/2010 13:07, Bruno Medeiros wrote:
> On 22/11/2010 04:56, Andrei Alexandrescu wrote:
>> On 11/21/10 22:09 CST, Rainer Deyke wrote:
>>> On 11/21/2010 17:31, Andrei Alexandrescu wrote:
>>> char[] and wchar[] fail to provide some of the guarantees of all other
>>> instances of T[].
>>
>> What exactly are those guarantees?
>>
>
> More exactly, that the following is true for any T:
>
> foreach(character; (T[]).init) {
> static assert(is(typeof(character) == T));
> }
> static assert(std.range.isRandomAccessRange!(T[]));
>
> It is not true for char and wchar (the second assert fails).
> Another guarantee, similar in nature, and roughly described, is that
> functions in std.algorithm should never fail or throw when using an
> array as a argument (assuming the other arguments are valid). So for
> example:
>
> std.algorithm.filter!("true")(anArray)
>
> Should not throw, for any value of anArray. But it may if anArray is of
> type char[] or wchar[] and there is an encoding exception.
>
>
> I'll leave the arguing of whether we want those guarantees for other
> subthreads, but it should be well agreed by now, that the above is not
> guaranteed.
>
>

Actually, I'll reply here, on why I would like these guarantees:

I think these guarantees are desirable due to a general design principle 
of mine that goes something like this:
  * Avoid "bad" abstractions: the abstraction should reflect intent as 
closely and clearly as possible.

Yeah, that may not tell anyone much because it's very hard to 
objectively define whether an abstraction is "bad" or not, or better or 
worse than another. However, here are a few guidelines:
   - within the same level of functionality, things should be as simple 
and as orthogonal as possible.
   - don't confuse implementation with contract/interface/API. (note 
that I said "confuse", not "expose")

char[] is not as orthogonal as possible. char[] does not reflect it's 
underlying intent as clearly as it could. If it was defined in a struct, 
you could directly document the expectation that the underlying string 
must be a valid UTF-8 encoding. In fact, you could even make that a 
contract.

If instead of an argument based on a design principle, you ask for 
concrete examples of why this is undesirable, well, I have no examples 
to give...  I haven't used D enough to run into real-world examples, but 
I believe that whenever the above principle is violated, then it is very 
likely that problems and/or annoyances will occur sooner or later.

I should point out however, that, at least for me, the undesirability of 
the current behavior is actually very low. Compared to other language 
issues (whether current ones, or past ones), it does not seem that 
significant. For example, static arrays not being proper values types 
(plus their .init thing) was much worse, man, that annoyed the shit out 
of me.

Then again, someone with more experience using D might encounter a more 
serious real-world case regarding the current behavior. Also, regarding 
this:

On 22/11/2010 17:40, Andrei Alexandrescu wrote:
 >
 > Of course you can. After you were to admit that it makes next to no
 > sense to sort an array of code units, I would have said "well if somehow
 > you do imagine such a situation, you achieve that by saying what you
 > means: cast the char[] to ubyte[] and sort that".

Casting to ubyte[] does solve the use case, I agree. It does so with a 
minor inconvenience (having to cast), but it's very minor and I don't 
think it's that significant.
Rather, I'm more concerned with the use cases that actually want to use 
a char[] as a UTF-8 encoded string. As I mentioned above, I'm afraid of 
situations where this inconsistency might cause more significant 
inconveniences, maybe even bugs!

-- 
Bruno Medeiros - Software Engineer