Operator overloading leads to bad code optimization

max haughton maxhaton at gmail.com
Mon Dec 6 00:41:06 UTC 2021


On Monday, 6 December 2021 at 00:38:18 UTC, ClapTrap wrote:
> On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:
>> On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:
>>> On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:
>>>> Just a simple function to split a bezier in two.
>>>>
>>>> Using "-O3"
>>>>
>>>> LDC the operator version is 84 instructions
>>>> LDC the hand expanded math is 49 instructions.
>>>>
>>>> It seems something as simple as this should be better 
>>>> optimised? Or am I missing something?
>>>>
>>>> https://godbolt.org/z/4h9vob3Yo
>>>> [...]
>>>
>>> [...]
>>> Seems like GCC does not have this issue.
>>
>> With gdc v11.1, I count 69 instructions for split and 51 for 
>> split2 (59 with -O3). So I guess there's a semantic difference 
>> here with the slightly changed evaluation order (2D addition 
>> before scaling).
>
> gdc v11.1 doesn't inline the operator calls when I try it, if 
> you try an earlier version 10.2 it does which reduces it to 48 
> instructions
>
>> With `alias Point = __vector(float[2])`, split is reduced to 
>> 28 instructions: https://godbolt.org/z/7ffebjaz8
>
> Wow, that's awesome!

To make GCC inline properly without LTO you can use 
`-fwhole-program`.

Maybe Iain also has a flag that restores the old template 
behaviour.

These kinds of wacky phase ordering (I assume) issues is why I am 
slightly distrustful of GDC post-inlining decision.


More information about the digitalmars-d-ldc mailing list