Fix Phobos dependencies on autodecoding

Jonathan M Davis newsgroup.d at jmdavisprog.com
Tue Aug 13 09:15:30 UTC 2019


On Tuesday, August 13, 2019 2:52:58 AM MDT a11e99z via Digitalmars-d wrote:
> On Tuesday, 13 August 2019 at 07:51:23 UTC, Alexandru Ermicioi
>
> wrote:
> > On Tuesday, 13 August 2019 at 07:31:28 UTC, a11e99z wrote:
> >> On Tuesday, 13 August 2019 at 07:08:03 UTC, Walter Bright
> >
> > One of the reasons is that it adds unnecessary complexity for
> > templated code that is working with ranges. Check function
> > prototypes for some algorithms found in std.algorithm package,
> > you're bound to find special treatment for autodecoding
> > strings. It also messes up user expectation when suddenly
> > applying a range function on a string instead of front char
> > you're getting dchar.
>
> imo this is a contrived problem.
> string contains chars, not in meaning "char" as type but runes or
> codepoints.
> and world is not perfect so chars/runes are stored as utf8
> codepoints.
>
> in world where "char" is alias for "byte"/"ubyte" such vision was
> a problem:
>    is this buffer string(seq of chars) or just raw bytes? how it
> should be enumerated?
> but we have better world with different bytes and chars.
>
> probably better was naming for "char" as "utf8cp"/orSomething
> (don't mix with C/C++ type)
> and when u/anybody see string from that point everything falls
> into place.
>
> I don't see problem that str.front returns codepoint from
> 0..0x10ffff and when str.length returns 21 and str.count=12. but
> somebody see problem here, so again this is a contrived problem.
> and for now this vision problem will recreate/recheck tons of
> code.
> I thought that WB don't want change code peremptorily. Should be
> BIG problem when he does.

Code points are almost always the wrong level to be operating at. Many
algorithms can operate at the code unit level with no problem, whereas those
that require decoding usually need to operate at the grapheme level so that
the actual, conceptual characters are being compared. Just like code units
aren't necessarily full characters, code points aren't necessarily full
characters.

Auto-decoding was introduced, because at the time, Andrei did not have a
solid enough understanding of Unicode and thought that code points were
always entire characters and didn't know about graphemes. Having
auto-decoding has caused us tons of problems. It's inefficient, gives a
false sense of code correctness, requires special-casing all over the place,
and the whole "narrow string" concept causes all kinds of grief where
algorithms don't work properly with strings, because they don't consider
them to be random access, have a different type for their range element type
than for their actual element type, etc. Pretty much all of the big D
contributors have thought for years now that auto-decoding was a mistake,
and we've wanted to get rid of it. Many of us actually thought that
autodecoding was a good idea at first, but we've all come to understand how
terrible it is. Walter is one of the few that understood from the get-go,
but he wasn't paying much attention to Phobos (since he usually focuses on
the compiler) and didn't catch Andrei's mistake. If he had, autodecoding
would never have been a thing in Phobos.

The only reason that auto-decoding still exists in Phobos is because of how
hard it is to remove without breaking code. Making Phobos not rely on
autodecoding and making it so that it will work regardless of whether the
character type for a range is char, wchar, dchar, or a grapheme is exactly
what we need to be doing. Some work has been done in that direction already
but nowhere near enough. Once that's done, then we can look at how to fully
remove autodecoding, be it Phobos v2 (which Andrei has already proposed) or
some other clever solution. But regardless of how we go about removing
auto-decoding - or even if we ultimately end up leaving it in place - we
need to make Phobos autodecoding-agnostic so that it's not forced on
everything.

- Jonathan M Davis





More information about the Digitalmars-d mailing list