standard ranges

Thu Jun 28 01:05:19 PDT 2012

"Jonathan M Davis" , dans le message (digitalmars.D:170852), a écrit :
> completely consistent with regards to how it treats strings. The _only_ 
> inconsintencies are between the language and the library - namely how foreach 
> iterates on code units by default and the fact that while the language defines 
> length, slicing, and random-access operations for strings, the library 
> effectively does not consider strings to have them.

char[] is not treated as an array by the library, and is not treated as 
a RandomAccessRange. That is a second inconsistency, and it would be 
avoided is string were a struct.

I won't repeat arguments that were already said, but if it matters, to 
me, things should be such that:

 - string is a druntime defined struct, with an undelying 
immutable(char)[]. It is a BidirectionalRange of dchar. Slicing is 
provided for convenience, but not as opSlice, since it is not O(1), but 
as a method with a separate name. Direct access to the underlying 
char[]/ubyte[] is provided.

 - similar structs are provided to hold underlying const(char)[] and 
char[]

 - similar structs are provided for wstring

 - dstring is a druntime defined alias to dchar[] or a struct with the 
same functionalities for consistency with narrow string being struct.

 - All those structs may be provided as a template.
struct string(T = immutable(char)) {...}
alias string(immutable(wchar)) wstring;
alias string(immutable(dchar)) dstring;

string(const(char)) and string(char) ... are the other types of 
strings.

 - this string template could also be defined as a wrapper to convert 
any range of char/wchar into a range of dchar. That does not need to be 
in druntime. Only types necessary for string litterals should be in 
druntime.

 - string should not be convertible to char*. Use toStringz to interface 
with c code, or the underlying char[] if you know you it is 
zero-terminated, at you own risk. Only string litterals need to be 
convertible to char*, and I would say that they should be 
zero-terminated only when they are directly used as char*, to allow the 
compiler to optimize memory.

 - char /may/ disappear in favor of ubyte (or the contrary, or one could 
alias the other), if there is no other need to keep separate types that 
having strings that are different from ubyte[]. Only dchar is necessary, 
and it could just be called char.

That is ideal to me. Of course, I understand code compatibility is 
important, and compromises have to be made. The current situation is a 
compromise, but I don't like it because it is a WAT for every newcomer. 
But the last point, for example, would bring no more that code breakage. 
Such code breakage may make us find bugs however...

-- 
Christophe