Proposal for fixing dchar ranges

H. S. Teoh hsteoh at quickfur.ath.cx
Mon Mar 10 07:54:50 PDT 2014


On Mon, Mar 10, 2014 at 09:35:44AM -0400, Steven Schveighoffer wrote:
[...]
> An idea to fix the whole problems I see with char[] being treated
> specially by phobos: introduce an actual string type, with char[] as
> backing, that is a dchar range, that actually dictates the rules we
> want. Then, make the compiler use this type for literals.
> 
> e.g.:
> 
> struct string {
>    immutable(char)[] representation;
>    this(char[] data) { representation = data;}
>    ... // dchar range primitives
> }
> 
> Then, a char[] array is simply an array of char[].
> 
> points:
> 
> 1. No more issues with foreach(c; "cassé"), it iterates via dchar
> 2. No more issues with "cassé"[4], it is a static compiler error.
> 3. No more awkward ASCII manipulation using ubyte[].
> 4. No more phobos schizophrenia saying char[] is not an array.
> 5. No more special casing char[] array templates to fool the compiler.
> 6. Any other special rules we come up with can be dictated by the
> library, and not ignored by the compiler.

I like this idea. Special-casing char[] in templates was a bad idea. It
makes Phobos code needlessly complex, and the inconsistent treatment of
char[] sometimes as an array of char and sometimes not causes silly
issues like foreach defaulting to char but range iteration defaulting to
dchar. Enclosing it in a struct means we can enforce string rules
separately from the fact that it's a char array.


> Note, std.algorithm.copy(string1, mutablestring) will still
> decode/encode, but it's more explicit. It's EXPLICITLY a dchar
> range. Use std.algorithm.copy(string1.representation,
> mutablestring.representation) will avoid the issues.
> 
> I imagine only code that is currently UTF ignorant will break, and
> that code is easily 'fixed' by adding the 'representation'
> qualifier.
[...]

The only concern I have is the current use of char[] and const(char)[]
as mutable strings, and the current implicit conversion from string to
const(char)[]. We would need similar wrappers for char[] and
const(char)[], and string and mutablestring must be implicitly
convertible to conststring, otherwise a LOT of existing code will break
in a major way. Plus, these wrappers should also expose the same dchar
range API with .representation giving a way to get at the raw code
units.


T

-- 
It is the quality rather than the quantity that matters. -- Lucius Annaeus Seneca


More information about the Digitalmars-d mailing list