standard ranges

Thu Jun 28 01:29:11 PDT 2012

On Thursday, June 28, 2012 08:05:19 Christophe Travert wrote:
> "Jonathan M Davis" , dans le message (digitalmars.D:170852), a écrit :
> > completely consistent with regards to how it treats strings. The _only_
> > inconsintencies are between the language and the library - namely how
> > foreach iterates on code units by default and the fact that while the
> > language defines length, slicing, and random-access operations for
> > strings, the library effectively does not consider strings to have them.

> char[] is not treated as an array by the library

Phobos _does_ treat char[] as an array. isDynamicArray!(char[]) is true, and 
char[] works with the functions in std.array. It's just that they're all 
special-cased appropriately to handle narrow strings properly. What it doesn't 
do is treat char[] as a range of char.

> and is not treated as a RandomAccessRange.

Which is what I already said.

> That is a second inconsistency, and it would be avoided is string were a 
struct.

No, it wouldn't. It is _impossible_ to implement length, slicing, and indexing 
for UTF-8 and UTF-16 strings in O(1). Whether you're using an array or a 
struct to represent them is irrelevant. And if you can't do those operations 
in O(1), then they can't be random access ranges.

The _only_ thing that using a struct for narrow strings fixes is the 
inconsistencies with foreach (it would then use dchar just like all of the 
range stuff does), and slicing, indexing, and length wouldn't be on it, 
eliminating the oddity of them existing but not considered to exist by range-
based functions. It _would_ make things somewhat nicer for newbies, but it 
would not give you one iota more of functionality. Narrow strings would still 
be bidirectional ranges but not access ranges, and you would still have to 
operate on the underlying array to operate on strings efficiently.

If we were to start from stratch, it probably would be better to go with a 
struct type for strings, but it would break far too much code for far too 
little benefit at this point. You need to understand the unicode stuff 
regardless - like the difference between code units and code points. So, if 
anything, the fact that strings are treated inconsistently and are treated as 
ranges of dchar - which confuses so many newbies - is arguably a _good_ thing 
in that it forces newbies to realize and understand the unicode issues 
involved rather than blindly using strings in a horribly inefficient manner as 
would inevitably occur with a struct string type.

So, no, the situation is not exactly ideal, and yes, a struct string type 
might have been a better solution, but I think that many of the folks who are 
pushing for a struct string type are seriously overestimating the problems 
that it would solve. Yes, it would make the language and library more 
consistent, but that's it. You'd still have to use strings in essentially the 
same way that you do now. It's just that you wouldn't have to explicitly use 
dchar with foreach, and you'd have to get at the property which returned the 
underlying array in order to operate on the code units as you need to do in 
many functions to make your code appropriately efficient rather than simply 
using the string that way directly by not using its range-based functions. 
There is a difference, but it's a lot smaller than many people seem to think.

- Jonathan M Davis