Unicode handling comparison

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Nov 28 11:32:00 PST 2013

28-Nov-2013 17:24, monarch_dodra пишет:
> On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
> wrote:
>> Sadly,
> I think it's great. It means by default, your strings will always
> be handled correctly. I think there's quite a few algorithms that
> were written without ever taking strings into account, but still
> happen to work with them.

The greatest problem is surprisingly that you can't use range functions 
to the implicit codeunit range even if you REALLY wanted to.

To not go far away - the only reason std.regex can't take e.g. retro of 

match(retro("hleb), ".el.");

is because of the automatic dumbing down at the moment you apply range 
adapter. What I'd need in std.regex is a codeunit range that due to 
convention also "happens to be" a range of codepoints.

The second problem is that string code is carefully special cased but 
the effort is completely wasted the moment you have a slice of char-s 
that come from anywhere else (circular buffer, for instance) then 
built-in strings.

I had a (a bit cloudy) vision of settling encoded ranges problem once 
and for good. That includes defining notion of an encoded range that is 
2 in one: some stronger (as in capabilities) range of code elements and 
the default decoded view imposed on top of it (that can be weaker).

Dmitry Olshansky

More information about the Digitalmars-d mailing list