Unicode handling comparison

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Nov 28 11:32:00 PST 2013


28-Nov-2013 17:24, monarch_dodra пишет:
> On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
> wrote:
>> Sadly,
>
> I think it's great. It means by default, your strings will always
> be handled correctly. I think there's quite a few algorithms that
> were written without ever taking strings into account, but still
> happen to work with them.
>

The greatest problem is surprisingly that you can't use range functions 
to the implicit codeunit range even if you REALLY wanted to.

To not go far away - the only reason std.regex can't take e.g. retro of 
string:

match(retro("hleb), ".el.");

is because of the automatic dumbing down at the moment you apply range 
adapter. What I'd need in std.regex is a codeunit range that due to 
convention also "happens to be" a range of codepoints.

The second problem is that string code is carefully special cased but 
the effort is completely wasted the moment you have a slice of char-s 
that come from anywhere else (circular buffer, for instance) then 
built-in strings.

I had a (a bit cloudy) vision of settling encoded ranges problem once 
and for good. That includes defining notion of an encoded range that is 
2 in one: some stronger (as in capabilities) range of code elements and 
the default decoded view imposed on top of it (that can be weaker).

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list