std.algorithm.remove and principle of least astonishment

Mon Nov 22 18:04:23 PST 2010

On Monday 22 November 2010 16:45:43 Andrei Alexandrescu wrote:
> On 11/22/10 5:59 PM, foobar wrote:
> > Canonical example: DNA.
> > I shouldn't need to write a special function to print it since it IS a
> > string. I shouldn't need to cast it in order to do operations on it like
> > sort, find, etc.
> 
> I think it's best to encode DNA strings as sequences of ubyte. UTF
> routines will work slower on them than functions for ubyte.
> 
> > D's [w|D|]char types make no sense since they are NOT characters and the
> > concept doesn't fit for unicode since as someone else wrote, there are
> > different levels of abstractions in unicode (copde point, code unit,
> > grapheme). Naming matters and having a cat called dog (char is actually
> > code unit) is a source of bugs.
> 
> I think the names are fine. It doesn't take much learning to understand
> that char, wchar, and dchar are UTF-8, UTF-16, and UTF-32 code units
> respectively. I mean it would be odd if they were something else.

The problem with char is that so many people are used to thinking of char as a 
character rather than a code unit. Once you get passed that, though, it's fine. I 
think that it's very well thought out as it is. It just takes some getting used 
to. Unfortunately though, it seems thinking of a char as UTF-8 code unit and 
_never_ dealing with it as a character is hard for a lot of people to adjust to.

- Jonathan M Davis