std.algorithm.remove and principle of least astonishment
Bruno Medeiros
brunodomedeiros+spam at com.gmail
Wed Nov 24 06:54:02 PST 2010
On 24/11/2010 13:07, Bruno Medeiros wrote:
> On 22/11/2010 04:56, Andrei Alexandrescu wrote:
>> On 11/21/10 22:09 CST, Rainer Deyke wrote:
>>> On 11/21/2010 17:31, Andrei Alexandrescu wrote:
>>> char[] and wchar[] fail to provide some of the guarantees of all other
>>> instances of T[].
>>
>> What exactly are those guarantees?
>>
>
> More exactly, that the following is true for any T:
>
> foreach(character; (T[]).init) {
> static assert(is(typeof(character) == T));
> }
> static assert(std.range.isRandomAccessRange!(T[]));
>
> It is not true for char and wchar (the second assert fails).
> Another guarantee, similar in nature, and roughly described, is that
> functions in std.algorithm should never fail or throw when using an
> array as a argument (assuming the other arguments are valid). So for
> example:
>
> std.algorithm.filter!("true")(anArray)
>
> Should not throw, for any value of anArray. But it may if anArray is of
> type char[] or wchar[] and there is an encoding exception.
>
>
> I'll leave the arguing of whether we want those guarantees for other
> subthreads, but it should be well agreed by now, that the above is not
> guaranteed.
>
>
Actually, I'll reply here, on why I would like these guarantees:
I think these guarantees are desirable due to a general design principle
of mine that goes something like this:
* Avoid "bad" abstractions: the abstraction should reflect intent as
closely and clearly as possible.
Yeah, that may not tell anyone much because it's very hard to
objectively define whether an abstraction is "bad" or not, or better or
worse than another. However, here are a few guidelines:
- within the same level of functionality, things should be as simple
and as orthogonal as possible.
- don't confuse implementation with contract/interface/API. (note
that I said "confuse", not "expose")
char[] is not as orthogonal as possible. char[] does not reflect it's
underlying intent as clearly as it could. If it was defined in a struct,
you could directly document the expectation that the underlying string
must be a valid UTF-8 encoding. In fact, you could even make that a
contract.
If instead of an argument based on a design principle, you ask for
concrete examples of why this is undesirable, well, I have no examples
to give... I haven't used D enough to run into real-world examples, but
I believe that whenever the above principle is violated, then it is very
likely that problems and/or annoyances will occur sooner or later.
I should point out however, that, at least for me, the undesirability of
the current behavior is actually very low. Compared to other language
issues (whether current ones, or past ones), it does not seem that
significant. For example, static arrays not being proper values types
(plus their .init thing) was much worse, man, that annoyed the shit out
of me.
Then again, someone with more experience using D might encounter a more
serious real-world case regarding the current behavior. Also, regarding
this:
On 22/11/2010 17:40, Andrei Alexandrescu wrote:
>
> Of course you can. After you were to admit that it makes next to no
> sense to sort an array of code units, I would have said "well if somehow
> you do imagine such a situation, you achieve that by saying what you
> means: cast the char[] to ubyte[] and sort that".
Casting to ubyte[] does solve the use case, I agree. It does so with a
minor inconvenience (having to cast), but it's very minor and I don't
think it's that significant.
Rather, I'm more concerned with the use cases that actually want to use
a char[] as a UTF-8 encoded string. As I mentioned above, I'm afraid of
situations where this inconsistency might cause more significant
inconveniences, maybe even bugs!
--
Bruno Medeiros - Software Engineer
More information about the Digitalmars-d
mailing list