Operator overloading leads to bad code optimization
max haughton
maxhaton at gmail.com
Mon Dec 6 00:41:06 UTC 2021
On Monday, 6 December 2021 at 00:38:18 UTC, ClapTrap wrote:
> On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:
>> On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:
>>> On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:
>>>> Just a simple function to split a bezier in two.
>>>>
>>>> Using "-O3"
>>>>
>>>> LDC the operator version is 84 instructions
>>>> LDC the hand expanded math is 49 instructions.
>>>>
>>>> It seems something as simple as this should be better
>>>> optimised? Or am I missing something?
>>>>
>>>> https://godbolt.org/z/4h9vob3Yo
>>>> [...]
>>>
>>> [...]
>>> Seems like GCC does not have this issue.
>>
>> With gdc v11.1, I count 69 instructions for split and 51 for
>> split2 (59 with -O3). So I guess there's a semantic difference
>> here with the slightly changed evaluation order (2D addition
>> before scaling).
>
> gdc v11.1 doesn't inline the operator calls when I try it, if
> you try an earlier version 10.2 it does which reduces it to 48
> instructions
>
>> With `alias Point = __vector(float[2])`, split is reduced to
>> 28 instructions: https://godbolt.org/z/7ffebjaz8
>
> Wow, that's awesome!
To make GCC inline properly without LTO you can use
`-fwhole-program`.
Maybe Iain also has a flag that restores the old template
behaviour.
These kinds of wacky phase ordering (I assume) issues is why I am
slightly distrustful of GDC post-inlining decision.
More information about the digitalmars-d-ldc
mailing list