Inlining asm functions

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Thu Jun 11 03:22:38 PDT 2009


bearophile wrote:
> While implementing some code I have seen that LDC (that uses Tango) isn't able to inline the code of expi, that is little more than a fsincos asm instruction. LDC is now able to inline sqrt, etc (that increases their efficiency a lot), but expi contains asm, and such functions aren't allowed to be inlined.
> 
> LDC has a way to allow such inlining of asm-containing functions anyway:
> pragma(allow_inline)
> 
> So I have seen my code get faster (with LDC) when I have used the following expi instead of the Tango one (I have removed the part that manages the case of no asm allowed because I was doing a quick test):
> 
> creal expi(real y) {
>     version (LDC) pragma(allow_inline);
> 	asm {           
> 		fld y;
> 		fsincos;
> 		fxch ST(1), ST(0);
> 	}
>     // add code here if asm isn't allowed
> }
> 
> Instead of just adding "version (LDC) pragma(allow_inline);" at the top of some Tango functions (array operations too can enjoy such inlining, because if you do a[]+b[] and their length is 4 there is a big overhead to call tango/Phobos functions), isn't it better to add to the D language something standard (= that works on DMD too) to state that some asm function can be inlined?
> (Generally the idea of porting back tiny things from LDC to DMD sounds nice, especially when DMD back-end is able to support them).

Note that for LDC, an even more optimal arrangement is something like[1]:
-----
version(LDC)
     import ldc.llvmasm;

creal expi(real y) {
     return __asm!(creal)("fsincos", "={st(0)},={st(1)},0", y);
}
-----
(That works for both x86 and x86-64)

This allows LLVM to load the real into the register any way it wants (not just 
an fld right before the fsincos), which may be useful when inlining. On x86 it 
also automatically inserts the fxch (the LLVM IR generated by LDC includes an 
explicit swap due to ABI issues), but it may omit it after inlining.

The pragma(allow_inline) is easier to add to code that needs to support other 
compilers too, though ;).


[1]: The asm may need some extra clobbers (probably either st(7) or st(2)-st(7)) 
to be correct, I'm not entirely sure.
(Clobbers are specified by appending something like ",~{st(7)}" to the 
comma-separated second string argument)



More information about the Digitalmars-d mailing list