std.algorithm.remove and principle of least astonishment

spir denis.spir at gmail.com
Mon Nov 22 02:25:37 PST 2010


On Sun, 21 Nov 2010 19:21:27 -0600
Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org> wrote:

> On 11/21/10 7:00 PM, Jonathan M Davis wrote:
> > Actually, the better implementation would probably be to provide wrapper ranges
> > for ranges of char and wchar so that you could access them as ranges of dchar.
> > Doing otherwise would make it so that you couldn't access them directly as
> > ranges of char or wchar, which would be limiting, and since it's likely that
> > anyone actually wanting strings would just use strings, there's a good chance
> > that in the majority of cases, what you'd want would really be a range of char
> > or wchar anyway. Regardless, it's quite possible to access containers of char or
> > wchar as ranges of dchar if you need to.
> 
> I agree except for the majority of cases part. In fact the original 
> design of range interfaces for char[] and wchar[] was to require 
> byDchar() to get a bidirectional interface over the arrays of code units.
> 
> That design, with which I experimented for a while, had two drawbacks:
> 
> 1. It had the default reversed, i.e. most often you want to regard a 
> char[] or a wchar[] as a range of code points, not as an array of code 
> units.
> 
> 2. It had the unpleasant effect that most algorithms in std.algorithm 
> and beyond did the wrong thing by default, and the right thing only if 
> you wrapped everything with byDchar().

I find these points most relevant. The issue is that *char[] actually are the mutable variants of *string. So that one needs to use them as textual types, meaning as strings of code points. Thus, I do not think the most common case is to have them iterated as strings of code _units_.

> The second iteration of the design, which is currently in use, was to 
> define in std.range the primitives such that char[] and wchar[] offer by 
> default the bidirectional range interface. I have gone through all 
> algorithms in std.algorithm and std.string and noticed with amazed 
> satisfaction that they most always did the right thing, and that I could 
> tweak the few that didn't to complete a satisfactory implementation. 
> (indexOf has slipped through the cracks.) I think that experience with 
> the current design is speaking in its favor.

This makes the safe and common case default.

> One thing could be done to drive the point home: a function byCodeUnit() 
> could be added that actually does iterate a char[] or a wchar[] one code 
> unit at a time (and consequently restores their behavior as T[]). That 
> function could be simply a cast to ubyte[]/ushort[], or it could 
> introduce a random-access range.

For sure, this would be useful in the cases where really needs code units. And make it clear that default iteration is _not_ over code units (thus avoiding part of the critics).

Maybe an alternative would be (or have been) to have complete lexical distinction between (text) strings and true char arrays, that applies whatever constness or mutability is wished.
* char[] is always an array of plain unsigned ints
* mutable strings can be defined using mutable(string) for text processing, still beeing indexed and iterated as strings of code _points_.

> Andrei

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



More information about the Digitalmars-d mailing list