standard ranges

Timon Gehr timon.gehr at gmx.ch
Wed Jun 27 12:28:17 PDT 2012


On 06/27/2012 08:54 PM, Gor Gyolchanyan wrote:
> On Wed, Jun 27, 2012 at 10:42 PM, Jonathan M Davis<jmdavisProg at gmx.com>  wrote:
>> On Wednesday, June 27, 2012 22:29:25 Gor Gyolchanyan wrote:
>>> Agreed. Having struct strings (with slices and everything) will set
>>> the record straight.
>>
>> Except that they couldn't have slicing, because it would be very inefficient.
>> You'd have to get at the actual array of code units to slice anything. A
>> struct string type would have to be restricted to exactly the same set of
>> operations that range-based functions consider strings to have and then give
>> you a way to get at the underlying code unit representation to be able to use
>> it when special-casing for strings for efficiency, just like you do now.
>>
>> You _can't_ get away from the fact that you're dealing with an array (or list
>> or whatever) of code units even if you do want to operate on it as a range of
>> code points most of the time. Having a struct would fix the issues like foreach
>> iterating over char by default whereas range-based functions iterate over
>> dchar - it would make it consistent by making it dchar for everything - but
>> the issue of code unit vs code point still remains and you can't get rid of
>> it. Anyone wanting to write efficient string-processing code _needs_ to
>> understand unicode. There's no way around it (which is part of the reason that
>> Walter isn't keen on the idea of changing how strings work in the language
>> itself).
>>
>> So, while having a string type which is a struct does help eliminate the
>> schizophrenia, the core problem of code unit vs code point is still there, and
>> you still need to understand it. There is no fix for it, because it's intrinsic
>> to how unicode works.
>>
>> - Jonathan M Davis
>
> Yes you can get away. The struct string would have ubyte[] ushort[]
> and uint[] as the representation. Maybe even the char[], wchar[] and
> dchar[], but those won't be strings as we know them now. The string
> struct will take care of encoding 100% transparently

Encoding cannot be taken care of 100% transparently. It has performance 
implications.

> and will provide access to the representation, which is good for bit blitting and other
> encoding-agnostic operations, but the representation is then known NOT
> to be a valid string

It is NOT known not to be a valid string. Furthermore, this directly 
contradicts what you claimed above. If the representation is exposed,
it is certainly not transparent.

> and will need to be placed into the string struct in order to use string operations.
>

aliasing..?


More information about the Digitalmars-d mailing list