Major performance problem with std.array.front()

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sat Mar 8 12:50:51 PST 2014


On 3/8/14, 12:38 PM, Vladimir Panteleev wrote:
> On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote:
>> 1. All algorithms would by default operate on strings at char/wchar
>> level (i.e. code unit). That would cause the usual issues and
>> confusions I was aware of from C++. Certain algorithms would require
>> specialization and/or the user using byDchar for correctness.
>
> As previously discussed, "correctness" here is conditional. I would not
> use that word, it is another extreme.

Agreed.

>> From experience with C++ I knew (1) had a bad track record, and (2)
>> "generically conservative, specialize for speed" was a successful
>> pattern.
>>
>> What would you have chosen given that context?
>
> Ideally, we would have the Unicode algorithms in the standard library
> from day 1, and advocated their use throughout the documentation.

It's not late to do a lot of that.

>>> I'm inclined to say that the correct approach is to
>>> state that algorithms operate explicitly on a T.sizeof basis and that if
>>> the data contained in a particular range has some multi-element encoding
>>> then separate, specialized routines should be used with the T.sizeof
>>> behavior will not produce the desired result.
>>
>> That sounds quite like C++ plus ICU. It doesn't strike me as the
>> golden standard for Unicode integration.
>
> Why not? Because it sounds like D needs exactly that. Plus its amazing
> slicing and range capabilities, of course.

Pretty much everyone using ICU hates it.

>>> So the problem to me is that we're stuck not fixing something that's
>>> horribly broken just because it's broken in a way that people presumably
>>> now expect.
>>
>> Clearly I'm being subjective here but again I'd find it difficult to
>> get convinced we have something horribly broken from the evidence I
>> gathered inside and outside Facebook.
>
> Have you or anyone you personally know tried to process text in D
> containing a writing system such as Sanskrit's?

No. Point being?

>>> I'd personally like to see this fixed and I think the new behavior is
>>> preferable overall, but I do share Andrei's concern that such a big
>>> change might hurt the language anyway.
>>
>> I've said this once and I'm saying it again: the best way to convert
>> this discussion into something useful is to devise ideas for useful
>> non-breaking additions.
>
> I disagree. As I've argued, I believe that currently most uses of dchars
> in an application are incorrect, and ultimately a time bomb for proper
> internationalization support. We need to apply the same procedure that
> we do with any language construct that was deemed to have been a poor
> decision: put it through a deprecation cycle and fix it.

I think there are too large risks for that, and it's quite unclear this 
is solving a problem. "Slightly better Unicode support" is hardly a good 
justification.


Andrei



More information about the Digitalmars-d mailing list