Proposal for fixing dchar ranges

Steven Schveighoffer schveiguy at yahoo.com
Mon Mar 10 08:02:05 PDT 2014


On Mon, 10 Mar 2014 10:48:26 -0400, Dicebot <public at dicebot.lv> wrote:

> On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote:
>> I proposed this inside the long "major performance problem with  
>> std.array.front," I've also proposed it before, a long time ago.
>>
>> But seems to be getting no attention buried in that thread, not even  
>> negative attention :)
>>
>> An idea to fix the whole problems I see with char[] being treated  
>> specially by phobos: introduce an actual string type, with char[] as  
>> backing, that is a dchar range, that actually dictates the rules we  
>> want. Then, make the compiler use this type for literals.
>>
>> e.g.:
>>
>> struct string {
>>    immutable(char)[] representation;
>>    this(char[] data) { representation = data;}
>>    ... // dchar range primitives
>> }
>>
>> Then, a char[] array is simply an array of char[].
>>
>> points:
>>
>> 1. No more issues with foreach(c; "cassé"), it iterates via dchar
>> 2. No more issues with "cassé"[4], it is a static compiler error.
>> 3. No more awkward ASCII manipulation using ubyte[].
>> 4. No more phobos schizophrenia saying char[] is not an array.
>> 5. No more special casing char[] array templates to fool the compiler.
>> 6. Any other special rules we come up with can be dictated by the  
>> library, and not ignored by the compiler.
>>
>> Note, std.algorithm.copy(string1, mutablestring) will still  
>> decode/encode, but it's more explicit. It's EXPLICITLY a dchar range.  
>> Use std.algorithm.copy(string1.representation,  
>> mutablestring.representation) will avoid the issues.
>>
>> I imagine only code that is currently UTF ignorant will break, and that  
>> code is easily 'fixed' by adding the 'representation' qualifier.
>>
>
> It will break any code that slices stored char[] strings directly which  
> may or may not be breaking UTF depending on how indices are calculated.

That is already broken. What I'm looking to do is remove the cruft and  
"WTF" factor of the current state of affairs (an array that's not an  
array).

Originally (in that long ago proposal) I had proposed to check for and  
disallow invalid slicing during runtime. In fact, it could be added if  
desired with the type defined by the library.

> Also adding one more runtime dependency into language but there are so  
> many that it probably does not matter.

alias string = immutable(char)[];

There isn't much extra dependency one must add to revert to the original  
behavior. In fact, one nice thing about this proposal is the compiler  
changes can be done and tested before any real meddling with the string  
type is done.

-Steve


More information about the Digitalmars-d mailing list