std.algorithm.remove and principle of least astonishment
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sun Nov 21 10:23:14 PST 2010
On 11/20/10 9:42 PM, Rainer Deyke wrote:
> On 11/20/2010 16:58, Andrei Alexandrescu wrote:
>> On 11/20/10 12:32 PM, Rainer Deyke wrote:
>>> std::vector<bool> in C++ is a specialization of std::vector that packs
>>> eight booleans into a byte instead of storing each element separately.
>>> It doesn't behave exactly like other std::vectors and technically
>>> doesn't meet the C++ requirements of a container, although it tries to
>>> come as close as possible. This means that any code that uses
>>> std::vector<bool> needs to be extra careful to take those differences in
>>> account. This is especially an issue when dealing with generic code
>>> that uses std::vector<T>, where T may or may not be bool.
>>>
>>> The issue with Vector!char is similar. Because char[] is not a true
>>> array, generic code that uses T[] can unexpectedly fail when T is char.
>>> Other containers of char behave like normal containers, iterating over
>>> individual chars. char[] iterates over dchars. Vector!char can,
>>> depending on its implementation, iterate over chars, iterate over
>>> dchars, or fail to compile at all when instantiated with T=char. It's
>>> not even clear which of these is the correct behavior.
>>
>> The parallel does not stand scrutiny. The problem with vector<bool> in
>> C++ is that it implements no formal abstraction, although it is a
>> specialization of one.
>
> The problem with std::vector<bool> is that it pretends to be a
> std::vector, but isn't. If it was called dynamic_bitset instead, nobody
> would have complained. char[] has exactly the same problem.
char[] does not exhibit the same issues that vector<bool> has. The
situation is very different, and again, trying to reduce one to another
misses a lot of the picture.
vector<bool> hides representation and in doing so becomes non-compliant
with vector<T> which does expose representation. Worse, vector<bool> is
not compliant with any concept, express or implied, which makes
vector<bool> virtually unusable with generic code.
In contrast, char[] exposes a meaningful representation (array of code
units) that is often useful, and obeys a slightly weaker formal
abstraction (bidirectional range) which is also useful. It's simply a
very different setup from vector<bool>, and again attempting to use one
in predicting the fare of the other is a poor approach.
>>> Vector!char is just an example. Any generic code that uses T[] can
>>> unexpectedly fail to compile or behave incorrectly used when T=char.
>>> If I were to use D2 in its present state, I would try to avoid both
>>> char/wchar and arrays as much as possible in order to avoid this
>>> trap. This would mean avoiding large parts of Phobos, and providing
>>> safe wrappers around the rest.
>>
>> It may be wise in fact to start using D2 and make criticism grounded in
>> reality that could help us improve the state of affairs.
>
> Sorry, but no. It would take a huge investment of time and effort on my
> part to switch from C++ to D. I'm not going to make that leap without
> looking first, and I'm not going to make it when I can see that I'm
> about to jump into a spike pit.
You may rest assured that if anything, strings are not a problem. The
way the abstractions are laid out make D's strings the best approach to
Unicode strings I know about.
>> The above is
>> only fallacious presupposition. Algorithms in Phobos are abstracted on
>> the formal range interface, and as such you won't be exposed to risks
>> when using them with strings.
>
> I'm not concerned about algorithms, I'm concerned about code that uses
> arrays directly. Like my Vector!char example, which I see you still
> haven't addressed.
When you define your abstractions, you are free to decide how you want
to go about them. The D programming language makes it unequivocally
clear that char[] is an array of UTF-8 code units that offers a
bidirectional range of code points. Same about wchar[] (replace UTF-8
with UTF-16). dchar[] is an array of UTF-32 code points which are
equivalent to code units, and as such is a full random-access range.
If you define your own function that uses an array directly, such as
sort(), then attempting to sort a char[] will get you exactly what you
expect - you sort the code units in the array. The sort routine in the
standard library is modeled to work with random access ranges, and will
refuse to sort a char[].
I have often reflected whether I'd do things differently if I could go
back in time and join Walter when he invented D's strings. I might have
done one or two things differently, but the gain would be marginal at
best. In fact, it's not impossible the balance of things could have been
hurt. Between speed, simplicity, effectiveness, abstraction, access to
representation, and economy of means, D's strings are the best
compromise out there that I know of, bar none by a wide margin.
Andrei
More information about the Digitalmars-d
mailing list