The Case Against Autodecode

Jack Stouffer via Digitalmars-d digitalmars-d at puremagic.com
Thu May 26 09:31:03 PDT 2016


On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu 
wrote:
> instead, it should use standard library algorithms for 
> searching,
> matching etc. When needed, iterating every code unit is 
> trivially
> done through indexing.

For an example where the std.algorithm/range functions don't cut 
it, my random format date string parser first breaks up the given 
character range into tokens. Once it has the tokens, it checks 
several known formats. One piece of that is checking if some of 
the tokens are in AAs of month and day names for fast tests of 
presence. Because the AAs are int[string], and it's unknowable 
the encoding of string (it's complicated), during tokenization, 
the character range must be forced to UTF-8 with byChar with all 
isSomeString!R == true inputs to avoid the auto-decoding and 
subsequent AA key mismatch.

> Agreed. This is probably the most glaring mistake. I think we 
> should open a discussion no fixing this everywhere in the 
> stdlib, even at the cost of breaking code.

See the discussion here: 
https://issues.dlang.org/show_bug.cgi?id=14519

I think some of the proposals there are interesting.

> Overall, I think the one way to make real steps forward in 
> improving string processing in the D language is to give a 
> clear answer of what char, wchar, and dchar mean.

If you agree that iterating over code units and code points isn't 
what people want/need most of the time, then I will quote 
something from my article on the subject:

"I really don't see the benefit of the automatic behavior 
fulfilling this one specific corner case when you're going to 
make everyone else call a range generating function when they 
want to iterate over code units or graphemes. Just make everyone 
call a range generating function to specify the type of iteration 
and save a lot of people the trouble!"

I think the only clear way forward is to not make strings ranges 
and force people to make a decision when passing them to range 
functions. The HUGE problem is the code this will break, which is 
just about all of it.


More information about the Digitalmars-d mailing list