std.algorithm.remove and principle of least astonishment
Michel Fortin
michel.fortin at michelf.com
Sun Nov 21 18:26:53 PST 2010
On 2010-11-21 20:21:27 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
> That design, with which I experimented for a while, had two drawbacks:
>
> 1. It had the default reversed, i.e. most often you want to regard a
> char[] or a wchar[] as a range of code points, not as an array of code
> units.
>
> 2. It had the unpleasant effect that most algorithms in std.algorithm
> and beyond did the wrong thing by default, and the right thing only if
> you wrapped everything with byDchar().
Well, basically these two arguments are the same: iterating by code
unit isn't a good default. And I agree. But I'm unconvinced that
iterating by dchar is the right default either. For one thing it has
more overhead, and for another it still doesn't represent a character.
Now, add graphemes to the equation and you have a representation that
matches the user-perceived character concept, but for that you add
another layer of decoding overhead and a variable-size data type to
manipulate (a grapheme is a sequence of code points). And you have to
use Unicode normalization when comparing graphemes. So is that a good
default? Probably not. It might be "correct" in some sense, but it's
totally overkill for most cases.
My thinking is that there is no good default. If you write an XML
parser, you'll probably want to work at the code point level; if you
write a JSON parser, you can easily skip the overhead and work at the
UTF-8 code unit level until you start parsing a string; if you write
something to count the number of user-perceived characters or want to
map characters to a font then you'll want graphemes...
Perhaps there should be simply no default; perhaps you should be forced
to choose explicitly at which layer you want to operate each time you
apply an algorithm on a string... and to make this less painful we
could have functions in std.string acting as a thin layer over similar
ones in std.algorithm that would automatically choose the right
representation for the algorithm depending on the operation.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list