The Case Against Autodecode

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 15:54:21 PDT 2016


On 06/02/2016 06:10 PM, Marco Leise wrote:
> Am Thu, 2 Jun 2016 15:05:44 -0400
> schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:
>
>> On 06/02/2016 01:54 PM, Marc Schütz wrote:
>>> Which practical tasks are made possible (and work _correctly_) if you
>>> decode to code points, that don't already work with code units?
>>
>> Pretty much everything.
>>
>> s.all!(c => c == 'ö')
>
> Andrei, your ignorance is really starting to grind on
> everyones nerves.

Indeed there seem to be serious questions about my competence, basic 
comprehension, and now knowledge.

I understand it is tempting to assume that a disagreement is caused by 
the other simply not understanding the matter. Even if that were true 
it's not worth sacrificing civility over it.

> If after 350 posts you still don't see
> why this is incorrect: s.any!(c => c == 'o'), you must be
> actively skipping the informational content of this thread.

Is it 'o' with an umlaut or without?

At any rate, consider s of type string and x of type dchar. The dchar 
type is defined as "a Unicode code point", or at least my understanding 
that has been a reasonable definition to operate with in the D language 
ever since its first release. Also in the D language, the various string 
types char[], wchar[] etc. with their respective qualified versions are 
meant to hold Unicode strings with one of the UTF8, UTF16, and UTF32 
encodings.

Following these definitions, it stands to reason to infer that the call 
s.find(c => c == x) means "find the code point x in string s and return 
the balance of s positioned there". It's prima facie application of the 
definitions of the entities involved.

Is this the only possible or recommended meaning? Most likely not, viz. 
the subtle cases in which a given grapheme is represented via either one 
or multiple code points by means of combining characters. Is it the best 
possible meaning? It's even difficult to define what "best" means 
(fastest, covering most languages, etc).

I'm not claiming that meaning is the only possible, the only 
recommended, or the best possible. All I'm arguing is that it's not 
retarded, and within a certain universe confined to operating at code 
point level (which is reasonable per the definitions of the types 
involved) it can be considered correct.

If at any point in the reasoning above some rampant ignorance comes 
about, please point it out.

> You are in error, no one agrees with you, and you refuse to see
> it and in the end we have to assume you will make a decisive
> vote against any PR with the intent to remove auto-decoding
> from Phobos.

This seems to assume I have some vesting in the position that makes it 
independent of facts. That is not the case. I do what I think is right 
to do, and you do what you think is right to do.

> Your so called vocal minority is actually D's panel of Unicode
> experts who understand that auto-decoding is a false ally and
> should be on the deprecation track.

They have failed to convince me. But I am more convinced than before 
that RCStr should not offer a default mode of iteration. I think its 
impact is lost in this discussion, because once it's understood RCStr 
will become D's recommended string type, the entire matter becomes moot.

> Remember final-by-default? You promised, that your objection
> about breaking code means that D2 will only continue to be
> fixed in a backwards compatible way, be it the implementation
> of shared or whatever else. Yet months later you opened a
> thread with the title "inout must go". So that must have been
> an appeasement back then. People don't forget these things
> easily and RCStr seems to be a similar distraction,
> considering we haven't looked into borrowing/scoped enough and
> you promise wonders from it.

What the hell is this, digging dirt on me? Paying back debts? Please 
stop that crap.


Andrei




More information about the Digitalmars-d mailing list