standard ranges

Gor Gyolchanyan gor.f.gyolchanyan at gmail.com
Wed Jun 27 11:54:28 PDT 2012


On Wed, Jun 27, 2012 at 10:42 PM, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> On Wednesday, June 27, 2012 22:29:25 Gor Gyolchanyan wrote:
>> Agreed. Having struct strings (with slices and everything) will set
>> the record straight.
>
> Except that they couldn't have slicing, because it would be very inefficient.
> You'd have to get at the actual array of code units to slice anything. A
> struct string type would have to be restricted to exactly the same set of
> operations that range-based functions consider strings to have and then give
> you a way to get at the underlying code unit representation to be able to use
> it when special-casing for strings for efficiency, just like you do now.
>
> You _can't_ get away from the fact that you're dealing with an array (or list
> or whatever) of code units even if you do want to operate on it as a range of
> code points most of the time. Having a struct would fix the issues like foreach
> iterating over char by default whereas range-based functions iterate over
> dchar - it would make it consistent by making it dchar for everything - but
> the issue of code unit vs code point still remains and you can't get rid of
> it. Anyone wanting to write efficient string-processing code _needs_ to
> understand unicode. There's no way around it (which is part of the reason that
> Walter isn't keen on the idea of changing how strings work in the language
> itself).
>
> So, while having a string type which is a struct does help eliminate the
> schizophrenia, the core problem of code unit vs code point is still there, and
> you still need to understand it. There is no fix for it, because it's intrinsic
> to how unicode works.
>
> - Jonathan M Davis

Yes you can get away. The struct string would have ubyte[] ushort[]
and uint[] as the representation. Maybe even the char[], wchar[] and
dchar[], but those won't be strings as we know them now. The string
struct will take care of encoding 100% transparently and will provide
access to the representation, which is good for bit blitting and other
encoding-agnostic operations, but the representation is then known NOT
to be a valid string and will need to be placed into the string struct
in order to use string operations.

-- 
Bye,
Gor Gyolchanyan.


More information about the Digitalmars-d mailing list