The Case Against Autodecode
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 12:05:44 PDT 2016
On 06/02/2016 01:54 PM, Marc Schütz wrote:
> On Thursday, 2 June 2016 at 14:28:44 UTC, Andrei Alexandrescu wrote:
>> That's not going to work. A false impression created in this thread
>> has been that code points are useless
>
> They _are_ useless for almost anything you can do with strings. The only
> places where they should be used are std.uni and std.regex.
>
> Again: What is the justification for using code points, in your opinion?
> Which practical tasks are made possible (and work _correctly_) if you
> decode to code points, that don't already work with code units?
Pretty much everything. Consider s and s1 string variables with possibly
different encodings (UTF8/UTF16).
* s.all!(c => c == 'ö') works only with autodecoding. It returns always
false without.
* s.any!(c => c == 'ö') works only with autodecoding. It returns always
false without.
* s.balancedParens('〈', '〉') works only with autodecoding.
* s.canFind('ö') works only with autodecoding. It returns always false
without.
* s.commonPrefix(s1) works only if they both use the same encoding;
otherwise it still compiles but silently produces an incorrect result.
* s.count('ö') works only with autodecoding. It returns always zero without.
* s.countUntil(s1) is really odd - without autodecoding, whether it
works at all, and the result it returns, depends on both encodings. With
autodecoding it always works and returns a number independent of the
encodings.
* s.endsWith('ö') works only with autodecoding. It returns always false
without.
* s.endsWith(s1) works only with autodecoding. Otherwise it compiles and
runs but produces incorrect results if s and s1 have different encodings.
* s.find('ö') works only with autodecoding. It never finds it without.
* s.findAdjacent is a very interesting one. It works with autodecoding,
but without it it just does odd things.
* s.findAmong(s1) is also interesting. It works only with autodecoding.
* s.findSkip(s1) works only if s and s1 have the same encoding.
Otherwise it compiles and runs but produces incorrect results.
* s.findSplit(s1), s.findSplitAfter(s1), s.findSplitBefore(s1) work only
if s and s1 have the same encoding. Otherwise they compile and run but
produce incorrect results.
* s.minCount, s.maxCount are unlikely to be terribly useful but with
autodecoding it consistently returns the extremum numeric code unit
regardless of representation. Without, they just return
encoding-dependent and meaningless numbers.
* s.minPos, s.maxPos follow a similar semantics.
* s.skipOver(s1) only works with autodecoding. Otherwise it compiles and
runs but produces incorrect results if s and s1 have different encodings.
* s.startsWith('ö') works only with autodecoding. Otherwise it compiles
and runs but produces incorrect results if s and s1 have different
encodings.
* s.startsWith(s1) works only with autodecoding. Otherwise it compiles
and runs but produces incorrect results if s and s1 have different
encodings.
* s.until!(c => c == 'ö') works only with autodecoding. Otherwise, it
will span the entire range.
===
The intent of autodecoding was to make std.algorithm work meaningfully
with strings. As it's easy to see I just went through
std.algorithm.searching alphabetically and found issues literally with
every primitive in there. It's an easy exercise to go forth with the others.
Andrei
More information about the Digitalmars-d
mailing list