std.algorithm.remove and principle of least astonishment

Sun Nov 21 17:21:27 PST 2010

On 11/21/10 7:00 PM, Jonathan M Davis wrote:
> Actually, the better implementation would probably be to provide wrapper ranges
> for ranges of char and wchar so that you could access them as ranges of dchar.
> Doing otherwise would make it so that you couldn't access them directly as
> ranges of char or wchar, which would be limiting, and since it's likely that
> anyone actually wanting strings would just use strings, there's a good chance
> that in the majority of cases, what you'd want would really be a range of char
> or wchar anyway. Regardless, it's quite possible to access containers of char or
> wchar as ranges of dchar if you need to.

I agree except for the majority of cases part. In fact the original 
design of range interfaces for char[] and wchar[] was to require 
byDchar() to get a bidirectional interface over the arrays of code units.

That design, with which I experimented for a while, had two drawbacks:

1. It had the default reversed, i.e. most often you want to regard a 
char[] or a wchar[] as a range of code points, not as an array of code 
units.

2. It had the unpleasant effect that most algorithms in std.algorithm 
and beyond did the wrong thing by default, and the right thing only if 
you wrapped everything with byDchar().

The second iteration of the design, which is currently in use, was to 
define in std.range the primitives such that char[] and wchar[] offer by 
default the bidirectional range interface. I have gone through all 
algorithms in std.algorithm and std.string and noticed with amazed 
satisfaction that they most always did the right thing, and that I could 
tweak the few that didn't to complete a satisfactory implementation. 
(indexOf has slipped through the cracks.) I think that experience with 
the current design is speaking in its favor.

One thing could be done to drive the point home: a function byCodeUnit() 
could be added that actually does iterate a char[] or a wchar[] one code 
unit at a time (and consequently restores their behavior as T[]). That 
function could be simply a cast to ubyte[]/ushort[], or it could 
introduce a random-access range.

Andrei