standard ranges

Wed Jun 27 11:42:26 PDT 2012

On Wednesday, June 27, 2012 22:29:25 Gor Gyolchanyan wrote:
> Agreed. Having struct strings (with slices and everything) will set
> the record straight.

Except that they couldn't have slicing, because it would be very inefficient. 
You'd have to get at the actual array of code units to slice anything. A 
struct string type would have to be restricted to exactly the same set of 
operations that range-based functions consider strings to have and then give 
you a way to get at the underlying code unit representation to be able to use 
it when special-casing for strings for efficiency, just like you do now.

You _can't_ get away from the fact that you're dealing with an array (or list 
or whatever) of code units even if you do want to operate on it as a range of 
code points most of the time. Having a struct would fix the issues like foreach 
iterating over char by default whereas range-based functions iterate over 
dchar - it would make it consistent by making it dchar for everything - but 
the issue of code unit vs code point still remains and you can't get rid of 
it. Anyone wanting to write efficient string-processing code _needs_ to 
understand unicode. There's no way around it (which is part of the reason that 
Walter isn't keen on the idea of changing how strings work in the language 
itself).

So, while having a string type which is a struct does help eliminate the 
schizophrenia, the core problem of code unit vs code point is still there, and 
you still need to understand it. There is no fix for it, because it's intrinsic 
to how unicode works.

- Jonathan M Davis