standard ranges
Jonathan M Davis
jmdavisProg at gmx.com
Thu Jun 28 01:29:11 PDT 2012
On Thursday, June 28, 2012 08:05:19 Christophe Travert wrote:
> "Jonathan M Davis" , dans le message (digitalmars.D:170852), a écrit :
> > completely consistent with regards to how it treats strings. The _only_
> > inconsintencies are between the language and the library - namely how
> > foreach iterates on code units by default and the fact that while the
> > language defines length, slicing, and random-access operations for
> > strings, the library effectively does not consider strings to have them.
> char[] is not treated as an array by the library
Phobos _does_ treat char[] as an array. isDynamicArray!(char[]) is true, and
char[] works with the functions in std.array. It's just that they're all
special-cased appropriately to handle narrow strings properly. What it doesn't
do is treat char[] as a range of char.
> and is not treated as a RandomAccessRange.
Which is what I already said.
> That is a second inconsistency, and it would be avoided is string were a
struct.
No, it wouldn't. It is _impossible_ to implement length, slicing, and indexing
for UTF-8 and UTF-16 strings in O(1). Whether you're using an array or a
struct to represent them is irrelevant. And if you can't do those operations
in O(1), then they can't be random access ranges.
The _only_ thing that using a struct for narrow strings fixes is the
inconsistencies with foreach (it would then use dchar just like all of the
range stuff does), and slicing, indexing, and length wouldn't be on it,
eliminating the oddity of them existing but not considered to exist by range-
based functions. It _would_ make things somewhat nicer for newbies, but it
would not give you one iota more of functionality. Narrow strings would still
be bidirectional ranges but not access ranges, and you would still have to
operate on the underlying array to operate on strings efficiently.
If we were to start from stratch, it probably would be better to go with a
struct type for strings, but it would break far too much code for far too
little benefit at this point. You need to understand the unicode stuff
regardless - like the difference between code units and code points. So, if
anything, the fact that strings are treated inconsistently and are treated as
ranges of dchar - which confuses so many newbies - is arguably a _good_ thing
in that it forces newbies to realize and understand the unicode issues
involved rather than blindly using strings in a horribly inefficient manner as
would inevitably occur with a struct string type.
So, no, the situation is not exactly ideal, and yes, a struct string type
might have been a better solution, but I think that many of the folks who are
pushing for a struct string type are seriously overestimating the problems
that it would solve. Yes, it would make the language and library more
consistent, but that's it. You'd still have to use strings in essentially the
same way that you do now. It's just that you wouldn't have to explicitly use
dchar with foreach, and you'd have to get at the property which returned the
underlying array in order to operate on the code units as you need to do in
many functions to make your code appropriately efficient rather than simply
using the string that way directly by not using its range-based functions.
There is a difference, but it's a lot smaller than many people seem to think.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list