Cross module inlining in runtime

Iain Buclaw ibuclaw at ubuntu.com
Tue Jan 10 16:01:04 PST 2012


On 10 January 2012 23:38, Artur Skawina <art.08.09 at gmail.com> wrote:
> On 01/11/12 00:30, Iain Buclaw wrote:
>> On 10 January 2012 19:49, Artur Skawina <art.08.09 at gmail.com> wrote:
>>>> I have porting the runtime/phobos asms to gcc asm on my to-do list, will
>>>> try to get to that within two weeks. What would be the preferred way -
>>>> version() guards? if yes - what version? Or would you prefer replacing
>>>> the asms, if the changes are not going to be merged upstream anyway?
>>>
>>> So i decided to start with this today. As i have a case where turning on
>>> logging increases a programs run time from seconds to hours, while it
>>> spends most of the time in the GC, thought gcbits would be a good place
>>> to start.
>>>
>>> But after adding gdc asm support to GCBits.testClear() the only thing that
>>> changed was this:
>>>
>>> XXXXXXXX <uint gc.gcbits.GCBits.testClear(uint)>:
>>>                push   %eRX
>>>                mov    %eRX,%eRX
>>>                mov    XX(%eRX),%eRX
>>> -               push   %eRX
>>> -               mov    %eRX,%eRX
>>> -               shr    $0x5,%eRX
>>> -               lea    XX(,%eRX,4),%eRX
>>> -               mov    XX(%eRX),%eRX
>>> -               add    (%eRX),%eRX
>>> -               mov    $0x1,%eRX
>>> -               shl    %Rl,%eRX
>>> -               mov    %eRX,%eRX
>>>                mov    (%eRX),%eRX
>>> -               not    %eRX
>>> -               and    %eRX,%eRX
>>> -               and    %eRX,%eRX
>>> -               mov    %eRX,(%eRX)
>>> -               pop    %eRX
>>> +               mov    XX(%eRX),%eRX
>>> +               btr    %eRX,XX(%eRX)
>>> +               sbb    %eRX,%eRX
>>>                pop    %eRX
>>>                ret
>>>
>>> OK, the function turned into ~ three instructions, good, but why didn't it
>>> then get inlined into any of the callers? Trying to force things with an
>>> attribute turned up this:
>>>
>>
>> four instructions. :~)
>
> I'm already imagining the inlined case, where the "mov" could be free. :)
>
>
>>> ../../../libphobos/gc/gcx.d: In member function 'gc.gcx.Gcx.fullcollect':
>>> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available
>>> ../../../libphobos/gc/gcx.d:2647:0: sorry, unimplemented: called from here
>>> BUILD32/gdc/dev/gcc-4.6.1/libphobos/gc/gcbits.d:119:0: sorry, unimplemented: inlining failed in call to 'testClear': function body not available
>>> ../../../libphobos/gc/gcx.d:2729:0: sorry, unimplemented: called from here
>>> make[3]: *** [gc/gcx.o] Error 1
>>>
>>> Any way to make this work? Much of the asm gains will be lost when the code
>>> isn't inlined.
>
>> How is the function written?
>
> --- druntime/gc/gcbits.d.org    2012-01-10 19:56:11.580039157 +0100
> +++ druntime/gc/gcbits.d        2012-01-10 23:19:50.046264596 +0100

Ahh, I've just noticed that the functions are in a different module
than the only you are calling from.

GCC asm doesn't prevent inlining of functions, but as of yet you
cannot inline functions across modules unless you compile both files
at the same time, in which case, they get combined into one object
file (must specify -o or this doesn't work).

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


More information about the D.gnu mailing list