The Case Against Autodecode
Jack Stouffer via Digitalmars-d
digitalmars-d at puremagic.com
Thu May 26 09:31:03 PDT 2016
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu
wrote:
> instead, it should use standard library algorithms for
> searching,
> matching etc. When needed, iterating every code unit is
> trivially
> done through indexing.
For an example where the std.algorithm/range functions don't cut
it, my random format date string parser first breaks up the given
character range into tokens. Once it has the tokens, it checks
several known formats. One piece of that is checking if some of
the tokens are in AAs of month and day names for fast tests of
presence. Because the AAs are int[string], and it's unknowable
the encoding of string (it's complicated), during tokenization,
the character range must be forced to UTF-8 with byChar with all
isSomeString!R == true inputs to avoid the auto-decoding and
subsequent AA key mismatch.
> Agreed. This is probably the most glaring mistake. I think we
> should open a discussion no fixing this everywhere in the
> stdlib, even at the cost of breaking code.
See the discussion here:
https://issues.dlang.org/show_bug.cgi?id=14519
I think some of the proposals there are interesting.
> Overall, I think the one way to make real steps forward in
> improving string processing in the D language is to give a
> clear answer of what char, wchar, and dchar mean.
If you agree that iterating over code units and code points isn't
what people want/need most of the time, then I will quote
something from my article on the subject:
"I really don't see the benefit of the automatic behavior
fulfilling this one specific corner case when you're going to
make everyone else call a range generating function when they
want to iterate over code units or graphemes. Just make everyone
call a range generating function to specify the type of iteration
and save a lot of people the trouble!"
I think the only clear way forward is to not make strings ranges
and force people to make a decision when passing them to range
functions. The HUGE problem is the code this will break, which is
just about all of it.
More information about the Digitalmars-d
mailing list