std.algorithm.remove and principle of least astonishment

Wed Nov 24 04:39:19 PST 2010

Andrei Alexandrescu wrote:
> On 11/22/10 12:01 PM, Steven Schveighoffer wrote:
>> On Mon, 22 Nov 2010 12:40:16 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>
>>> On 11/22/10 11:22 AM, Steven Schveighoffer wrote:
>>
>>>> You're dodging the question. You claim that if I want to use it as an
>>>> array, I use it as an array, if I want to use it as a range, use it 
>>>> as a
>>>> range. I'm simply pointing out why you can't use it as an array --
>>>> because phobos treats it as a bidirectional range, and you can't force
>>>> it to do what you want.
>>>
>>> Of course you can. After you were to admit that it makes next to no
>>> sense to sort an array of code units, I would have said "well if
>>> somehow you do imagine such a situation, you achieve that by saying
>>> what you means: cast the char[] to ubyte[] and sort that".
>>
>> That wasn't what you said -- you said I can use char[] as an array if I
>> want to use it as an array, not that I can use ubyte[] as an array
>> (nobody disputes that).
> 
> That still stays valid. The thing is, sort doesn't sort arrays, it sorts 
> random-access ranges.
> 
>>>> The thing is, *only* when one wants to create strings, does one want to
>>>> view the data type as a bidirectional string. When one wants to deal
>>>> with chars as an element of a container, I don't want to be restricted
>>>> to utf requirements.
>>>
>>> If you don't want to be restricted to utf requirements, use ubyte and
>>> ushort. You're saying "I want to use UTF code points without any
>>> associated UTF meaning".
>> And
>> easy to understand means easier to avoid mistakes. The point is, the
>> domain of valid elements in my application is defined by me, not by the
>> library. The library is making assumptions that my poker hands may
>> contain utf8 characters, while I know in my case they cannot.
> 
> Then what's wrong with ubyte? Why do you encode as UTF something that 
> you know isn't UTF? 

> Would you put an integral in a real even though you 
> know it's only integral?
I don't think that's a valid comparison, since we have integer types, 
but we don't have ASCII types.

Here's the issue as I see it: there are very common use cases (and lots 
of existing C code) for a type which stores an ASCII character.

I think we're seeing the exact same issue that causes to people to 
mistakenly use 'uint' when they mean 'positive integer'.
It LOOKS as though a char is a subset of dchar (ie, a dchar in the range 
0..0x7F).
It LOOKS as though a uint is a subset of int (ie, an int in the range 
0..int.max).

But in both cases, the possibility that the high bit could be set, 
changes the semantics.