Unicode handling comparison
Dmitry Olshansky
dmitry.olsh at gmail.com
Thu Nov 28 11:32:00 PST 2013
28-Nov-2013 17:24, monarch_dodra пишет:
> On Thursday, 28 November 2013 at 09:02:12 UTC, Walter Bright
> wrote:
>> Sadly,
>
> I think it's great. It means by default, your strings will always
> be handled correctly. I think there's quite a few algorithms that
> were written without ever taking strings into account, but still
> happen to work with them.
>
The greatest problem is surprisingly that you can't use range functions
to the implicit codeunit range even if you REALLY wanted to.
To not go far away - the only reason std.regex can't take e.g. retro of
string:
match(retro("hleb), ".el.");
is because of the automatic dumbing down at the moment you apply range
adapter. What I'd need in std.regex is a codeunit range that due to
convention also "happens to be" a range of codepoints.
The second problem is that string code is carefully special cased but
the effort is completely wasted the moment you have a slice of char-s
that come from anywhere else (circular buffer, for instance) then
built-in strings.
I had a (a bit cloudy) vision of settling encoded ranges problem once
and for good. That includes defining notion of an encoded range that is
2 in one: some stronger (as in capabilities) range of code elements and
the default decoded view imposed on top of it (that can be weaker).
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list