Jonathan M Davis
jmdavisProg at gmx.com
Wed Jun 27 11:42:26 PDT 2012
On Wednesday, June 27, 2012 22:29:25 Gor Gyolchanyan wrote:
> Agreed. Having struct strings (with slices and everything) will set
> the record straight.
Except that they couldn't have slicing, because it would be very inefficient.
You'd have to get at the actual array of code units to slice anything. A
struct string type would have to be restricted to exactly the same set of
operations that range-based functions consider strings to have and then give
you a way to get at the underlying code unit representation to be able to use
it when special-casing for strings for efficiency, just like you do now.
You _can't_ get away from the fact that you're dealing with an array (or list
or whatever) of code units even if you do want to operate on it as a range of
code points most of the time. Having a struct would fix the issues like foreach
iterating over char by default whereas range-based functions iterate over
dchar - it would make it consistent by making it dchar for everything - but
the issue of code unit vs code point still remains and you can't get rid of
it. Anyone wanting to write efficient string-processing code _needs_ to
understand unicode. There's no way around it (which is part of the reason that
Walter isn't keen on the idea of changing how strings work in the language
So, while having a string type which is a struct does help eliminate the
schizophrenia, the core problem of code unit vs code point is still there, and
you still need to understand it. There is no fix for it, because it's intrinsic
to how unicode works.
- Jonathan M Davis
More information about the Digitalmars-d