D's confusing strings (was Re: D on hackernews)

Wed Sep 21 15:41:00 PDT 2011

On Wednesday, September 21, 2011 14:08 Christophe wrote:
> Jonathan M Davis , dans le message (digitalmars.D:144944), a écrit :
> >> I never said there was a problem with drop.
> > 
> > Yes you did. You said:
> > 
> > "mini-quiz: what should std.range.drop(some_string, 1) do ?
> > hint: what it actually does is not what the documentation of phobos
> 
> ^^^^^^^^^^^^^^^^^^^^^^^
> 
> > suggests*..."
> 
> not documentation of drop.

You weren't specific enough to make it clear what you meant. It looked like
you were complaining about drop's documentation.

> > If you have a better solution, please share it, but the fact that we want
> > both efficiency and correctness binds us pretty thoroughly here.
> 
> - char[], etc. being real arrays.

Which is actually arguably a _bad_ thing, since it doesn't generally make 
sense to operate on individual chars. What you really want 99.99999999999% of 
the time is code points not code units.

> - strings being lazy ranges of dchar, providing access to underlying
> char[].
> 
> Correctness of the langage is better, since we don't have a T[] having a
> front method that returns something else than T, or a type that accepts
> opSlice but is not sliceable, etc.
> 
> Runtime correctness and efficiency are the same as the current ones,
> since the whole phobos already considers strings as lazy range of dchar.
> It is even better, since the user cannot change an arbitrary code point
> in a string without explicitely asking for the undelying char[].
> Optimizations can come the same way as they currently can, since the
> underlying char is accessible.
> 
> I can deal with strings the way they are, since they are an heritage.
> They are not perfect, and will never be unless computers become fat
> enough to treat dchar[] just as efficiently as char[]. I am also aware
> that phobos cannot be optimized for every cases in the first place, and
> I can change my mind.

So, essentially you're arguing for a wrapper around arrays of code units. That 
does add some benefits (such as making foreach default to dchar), but 
ultimately doesn't add that much additional benefit (it also makes dealing 
with array literals much more interesting). If we were to start over again, 
that may very well be the way that we'd go, but the added benefits just don't 
outweigh the immense amount of code breakage which would result. Maybe the 
situation will change with D3, but at this point, I think that we've done a 
fairly good job of making it possible to treat strings as ranges.

- Jonathan M Davis