std.algorithm.remove and principle of least astonishment

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Nov 22 10:45:46 PST 2010


On 11/22/10 12:01 PM, Steven Schveighoffer wrote:
> On Mon, 22 Nov 2010 12:40:16 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 11/22/10 11:22 AM, Steven Schveighoffer wrote:
>
>>> You're dodging the question. You claim that if I want to use it as an
>>> array, I use it as an array, if I want to use it as a range, use it as a
>>> range. I'm simply pointing out why you can't use it as an array --
>>> because phobos treats it as a bidirectional range, and you can't force
>>> it to do what you want.
>>
>> Of course you can. After you were to admit that it makes next to no
>> sense to sort an array of code units, I would have said "well if
>> somehow you do imagine such a situation, you achieve that by saying
>> what you means: cast the char[] to ubyte[] and sort that".
>
> That wasn't what you said -- you said I can use char[] as an array if I
> want to use it as an array, not that I can use ubyte[] as an array
> (nobody disputes that).

That still stays valid. The thing is, sort doesn't sort arrays, it sorts 
random-access ranges.

>>> The thing is, *only* when one wants to create strings, does one want to
>>> view the data type as a bidirectional string. When one wants to deal
>>> with chars as an element of a container, I don't want to be restricted
>>> to utf requirements.
>>
>> If you don't want to be restricted to utf requirements, use ubyte and
>> ushort. You're saying "I want to use UTF code points without any
>> associated UTF meaning".
>
> A literal defining an array of ubytes or ushorts is considerably more
> painful than one of chars.

I've been thinking for a while to have to!(const(ubyte)[]) simply insert 
a cast when passed const(char)[]. The cast is sound - you are asking for 
a view of individual code points in a string. That should help with 
literals.

>>> FWIW, I deal in ASCII pretty much exclusively, so sorting an array of
>>> char is not out of the question.
>>
>> Example?
>
> In some poker-hand detection code I've written in C++ (and actually in D
> too) in the past, I can use characters to represent each card.

Why not ubytes?

> A
> straightforward way to do this is to add each 'card' to a string, then
> sort the string. This allows me to use string functions and regex to
> detect hand types.
>
> You can do the same with ubytes, but it's not as easy to understand.

Why?

> And
> easy to understand means easier to avoid mistakes. The point is, the
> domain of valid elements in my application is defined by me, not by the
> library. The library is making assumptions that my poker hands may
> contain utf8 characters, while I know in my case they cannot.

Then what's wrong with ubyte? Why do you encode as UTF something that 
you know isn't UTF? Would you put an integral in a real even though you 
know it's only integral?

> If I could
> convey this in a way that allows me to keep the nice properties of char
> arrays (i.e. printing as strings), then I would be fine with the library
> assuming unless I told it so.

How would printing as strings be meaningful? I'd suspect you'd want to 
print a poker hand better than by using one character per card. Even if 
for some odd reason you want to print ubytes as characters in some 
exceptional situation, why don't you write a routine that does that and 
get over with?

> But there is no way currently, the library steadfastly refuses to look
> at it any other way than a utf-8 code sequence. It doesn't help matters
> that the compiler steadfastly looks at them as arrays.
>
> What I want is for the compiler *and* the library to look at strings as
> not arrays, and for both to look at char[] as an array. So I can clearly
> define my intent of how I want them to treat such variables.

I totally understand where you're coming from.

I believe you also understand where I'm coming from: within the 
constraints of making UTF built-in, integrated, efficient, and easy to 
understand, I think the current decisions taken by the language are 
good. To directly reply to your point: instead of ascribing your desired 
meaning to char[], you should use char[] for UTF-8 strings exclusively. 
For arrays of bytes, there's always ubyte[].

>>> I'm going to drop out of this discussion in order to develop a viable
>>> alternative to using arrays to represent strings. Then we can discuss
>>> the merits/drawbacks of such a type. I think it will be simple to build.
>
> Here I am continuing to argue. I swear I'll stop after this :) At least
> until I have my string type ready.

I suspect you'll notice before long that it's a considerably more 
difficult task than it might seem in the beginning, and that the result 
is bound to be less satisfactory than the current strings in at least 
some dimensions. But I welcome the initiative to bring a concrete 
abstraction (heh, oxymoron) on the table.


Andrei


More information about the Digitalmars-d mailing list