Proposal for fixing dchar ranges

John Colvin john.loughran.colvin at gmail.com
Mon Mar 10 14:46:23 PDT 2014


On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer 
wrote:
> I proposed this inside the long "major performance problem with 
> std.array.front," I've also proposed it before, a long time ago.
>
> But seems to be getting no attention buried in that thread, not 
> even negative attention :)
>
> An idea to fix the whole problems I see with char[] being 
> treated specially by phobos: introduce an actual string type, 
> with char[] as backing, that is a dchar range, that actually 
> dictates the rules we want. Then, make the compiler use this 
> type for literals.
>
> e.g.:
>
> struct string {
>    immutable(char)[] representation;
>    this(char[] data) { representation = data;}
>    ... // dchar range primitives
> }
>
> Then, a char[] array is simply an array of char[].
>
> points:
>
> 1. No more issues with foreach(c; "cassé"), it iterates via 
> dchar
> 2. No more issues with "cassé"[4], it is a static compiler 
> error.
> 3. No more awkward ASCII manipulation using ubyte[].
> 4. No more phobos schizophrenia saying char[] is not an array.
> 5. No more special casing char[] array templates to fool the 
> compiler.
> 6. Any other special rules we come up with can be dictated by 
> the library, and not ignored by the compiler.
>
> Note, std.algorithm.copy(string1, mutablestring) will still 
> decode/encode, but it's more explicit. It's EXPLICITLY a dchar 
> range. Use std.algorithm.copy(string1.representation, 
> mutablestring.representation) will avoid the issues.
>
> I imagine only code that is currently UTF ignorant will break, 
> and that code is easily 'fixed' by adding the 'representation' 
> qualifier.
>
> -Steve

just to check I understand this fully:

in this new scheme, what would this do?

auto s = "cassé".representation;
foreach(i, c; s) write(i, ':', c, ' ');
writeln(s);

Currently - without the .representation - I get

0:c 1:a 2:s 3:s 4:e 5:̠6:`
cassé

or, to spell it out a bit more:
0:c 1:a 2:s 3:s 4:e 5:xCC 6:x81
cassé


More information about the Digitalmars-d mailing list