Major performance problem with std.array.front()

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Mar 9 11:19:14 PDT 2014


On 3/9/14, 8:18 AM, Vladimir Panteleev wrote:
> On Sunday, 9 March 2014 at 05:10:26 UTC, Andrei Alexandrescu wrote:
>> On 3/8/14, 8:24 PM, Vladimir Panteleev wrote:
>>> On Sunday, 9 March 2014 at 04:18:15 UTC, Andrei Alexandrescu wrote:
>>>> What exactly is the consensus? From your wiki page I see "One of the
>>>> proposals in the thread is to switch the iteration type of string
>>>> ranges from dchar to the string's character type."
>>>>
>>>> I can tell you straight out: That will not happen for as long as I'm
>>>> working on D.
>>>
>>> Why?
>>
>> From the cycle "going in circles": because I think the breakage is way
>> too large compared to the alleged improvement.
>
> All right. I was wondering if there was something more fundamental
> behind such an ultimatum.

It's just factual information with no drama attached (i.e. I'm not 
threatening to leave the language, just plainly explain I'll never 
approve that particular change).

That said a larger explanation is in order. There have been cases in the 
past when our community has worked itself in a froth over a non-issue 
and ultimately caused a language change imposed by "the faction that 
shouted the loudest". The "lazy" keyword and recently the "virtual" 
keyword come to mind as cases in which the language leadership has been 
essentially annoyed into making a change it didn't believe in.

I am all about listening to the community's needs and desires. But at 
some point there is a need to stick to one's guns in matters of judgment 
call. See e.g. https://d.puremagic.com/issues/show_bug.cgi?id=11837 for 
a very recent example in which reasonable people may disagree but at 
some point you can't choose both options.

What we now have works as intended. As I mentioned, there is quite a bit 
more evidence the design is useful to people, than detrimental. Unicode 
is all about code points. Code units are incidental to each encoding. 
The fact that we recognize code points at language and library level is, 
in my opinion, a Good Thing(tm).

I understand that doesn't reach the ninth level of Nirvana and there are 
still issues to work on, and issues where good-looking code is actually 
incorrect. But I think we're overall in good shape. A regression from 
that to code unit level would be very destructive. Even a clear slight 
improvement that breaks backward compatibility would be destructive.

So I wanted to limit the potential damage of this discussion. It is made 
only a lot more dangerous that Walter himself started it, something that 
others didn't fail to tune into. The sheer fact that we got to 
contemplate an unbelievably massive breakage on no other evidence than 
one misuse case and for the sake of possibly an illusory improvement - 
that's a sign we need to grow up. We can't go like this about changing 
the language and aim to play in the big leagues.

>> In fact I believe that that design is inferior to the current one
>> regardless.
>
> I was hoping we could come to an agreement at least on this point.

Sorry to disappoint.

> ---
>
> BTW, a thought struck me while thinking about the problem yesterday.
>
> char and dchar should not be implicitly convertible between one another,
> or comparable to the other.

I think only the char -> dchar conversion works, and I can see arguments 
against it. Also comparison of char with dchar is dicey. But there are 
also cases in which it's legitimate to do that (e.g. assign ASCII chars 
etc) and this would be a breaking change.

One good way to think about breaking changes is "if this change were 
executed to perfection, how much would that improve the overall quality 
of D?" Because breakages _are_ "overall" - users don't care whether they 
come from this or the other part of the type system. Really puts things 
into perspective.

> void main()
> {
>      string s = "Привет";
>      foreach (c; s)
>          assert(c != 'Ñ');
> }
>
> Instead, std.conv.to should allow converting between character types,
> iff they represent one whole code point and fit into the destination
> type, and throw an exception otherwise (similar to how it deals with
> integer overflow). Char literals should be special-cased by the compiler
> to implicitly convert to any sufficiently large type.
>
> This would break more[1] code, but it would avoid the silent failures of
> the earlier proposal.
>
> [1] I went through my own larger programs. I actually couldn't find any
> uses of dchar which would be impacted by such a hypothetical change.

Generally I think we should steer away from slight improvements of the 
language at the cost of breaking existing code. Instead, we must think 
of ways to improve the language without the breakage. You may want to 
pursue (bugzilla + pull request) adding the std.conv routines with the 
semantics you mentioned.


Andrei



More information about the Digitalmars-d mailing list