The Case Against Autodecode
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 15:54:21 PDT 2016
On 06/02/2016 06:10 PM, Marco Leise wrote:
> Am Thu, 2 Jun 2016 15:05:44 -0400
> schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:
>
>> On 06/02/2016 01:54 PM, Marc Schütz wrote:
>>> Which practical tasks are made possible (and work _correctly_) if you
>>> decode to code points, that don't already work with code units?
>>
>> Pretty much everything.
>>
>> s.all!(c => c == 'ö')
>
> Andrei, your ignorance is really starting to grind on
> everyones nerves.
Indeed there seem to be serious questions about my competence, basic
comprehension, and now knowledge.
I understand it is tempting to assume that a disagreement is caused by
the other simply not understanding the matter. Even if that were true
it's not worth sacrificing civility over it.
> If after 350 posts you still don't see
> why this is incorrect: s.any!(c => c == 'o'), you must be
> actively skipping the informational content of this thread.
Is it 'o' with an umlaut or without?
At any rate, consider s of type string and x of type dchar. The dchar
type is defined as "a Unicode code point", or at least my understanding
that has been a reasonable definition to operate with in the D language
ever since its first release. Also in the D language, the various string
types char[], wchar[] etc. with their respective qualified versions are
meant to hold Unicode strings with one of the UTF8, UTF16, and UTF32
encodings.
Following these definitions, it stands to reason to infer that the call
s.find(c => c == x) means "find the code point x in string s and return
the balance of s positioned there". It's prima facie application of the
definitions of the entities involved.
Is this the only possible or recommended meaning? Most likely not, viz.
the subtle cases in which a given grapheme is represented via either one
or multiple code points by means of combining characters. Is it the best
possible meaning? It's even difficult to define what "best" means
(fastest, covering most languages, etc).
I'm not claiming that meaning is the only possible, the only
recommended, or the best possible. All I'm arguing is that it's not
retarded, and within a certain universe confined to operating at code
point level (which is reasonable per the definitions of the types
involved) it can be considered correct.
If at any point in the reasoning above some rampant ignorance comes
about, please point it out.
> You are in error, no one agrees with you, and you refuse to see
> it and in the end we have to assume you will make a decisive
> vote against any PR with the intent to remove auto-decoding
> from Phobos.
This seems to assume I have some vesting in the position that makes it
independent of facts. That is not the case. I do what I think is right
to do, and you do what you think is right to do.
> Your so called vocal minority is actually D's panel of Unicode
> experts who understand that auto-decoding is a false ally and
> should be on the deprecation track.
They have failed to convince me. But I am more convinced than before
that RCStr should not offer a default mode of iteration. I think its
impact is lost in this discussion, because once it's understood RCStr
will become D's recommended string type, the entire matter becomes moot.
> Remember final-by-default? You promised, that your objection
> about breaking code means that D2 will only continue to be
> fixed in a backwards compatible way, be it the implementation
> of shared or whatever else. Yet months later you opened a
> thread with the title "inout must go". So that must have been
> an appeasement back then. People don't forget these things
> easily and RCStr seems to be a similar distraction,
> considering we haven't looked into borrowing/scoped enough and
> you promise wonders from it.
What the hell is this, digging dirt on me? Paying back debts? Please
stop that crap.
Andrei
More information about the Digitalmars-d
mailing list