Is there a way to get a list of functions that get inlined by

Scorn scorn at trash-mail.com
Tue Feb 9 14:55:01 PST 2010



> Scorn:
> 
>> double min(double a, double b, double c)
>> {
>>     return a < b && a < c ? a : b < c ? b : c;
>> }
> 
> Don't write code like that, add some parenthesys like this:
> 
> return (a < b && a < c) ? a : (b < c ? b : c);
> 
> because the compiler is able to sort out those operator precedences, but the programmer that comes after you and reads that code will have problems.


Ok. The next time i post an example i will take care that it is more
readable :-)


> A compiler compiles that code with 3 FP tests, while I think two suffice, so there are better ways to write that.
>

:-)

Sure. Yes you are right. Since i do not want to sort the values a, b and
c (have a total order of things) i could, of course, write something
longer and a bit more efficient code like this:


double max(double a, double b, double c)
{
    if (a >= b)
    {
	if (a >= c) return a;
	else return c;
    }
    else
    {
	if (b >= c) return b;
	else return c;
    }
}


which just uses two comparisons instead of three. But trust me. That bad
code from above is not the explanation for the lack of speed in my
program and would be a bit longer to write as a mixin.

;-)



But here comes the interesting part:

> 
>> This is (and a little bit more) is running in a tight loop which runs
>> about 10000000 times.
>> With these "optimizations" i get a speed increase about 20% percent. 
> 
> ---------------------
> 
> I have created a module named "mo" and a main module named "temp":
> 
> module mo;
> int foo(int x) {
>     return x * x;
> }
> 
> double min3(double a, double b, double c) {
>     return (a <= b) ? (a <= c ? a : c) : (b <= c ? b : c);
> }
> 
> ---------------------
> 
> module temp; // main module
> version (Tango) {
>     import tango.stdc.stdio: printf;
>     import tango.stdc.stdlib: atoi, atof;
> } else {
>     import std.c.stdio: printf;
>     import std.c.stdlib: atoi, atof;
> }
> import mo: foo, min3;
> 
> void main() {
>     int x = atoi("12");
>     printf("%d\n", foo(x));
> 
>     double x1 = atof("10");
>     double x2 = atof("20");
>     double x3 = atof("30");
>     printf("%f\n", min3(x1, x2, x3));
> }
> 
> ---------------------
> 
> From my tests it seems LDC isn't able to inline those functions, while DMD is able to inline them :-)



And gdc does not seem to inline those functions neither :-(



> 
> ldc -O5 -release -output-s -inline temp.d mo.d
> 
> 08049600 <_Dmain>:
>  8049600:	83 ec 34             	sub    $0x34,%esp
>  8049603:	c7 04 24 e8 8c 05 08 	movl   $0x8058ce8,(%esp)
>  804960a:	e8 99 fd ff ff       	call   80493a8 <atoi at plt>
>  804960f:	e8 9c 00 00 00       	call   80496b0 <_D2mo3fooFiZi>
>  8049614:	89 44 24 04          	mov    %eax,0x4(%esp)
>  8049618:	c7 04 24 eb 8c 05 08 	movl   $0x8058ceb,(%esp)
>  804961f:	e8 64 fd ff ff       	call   8049388 <printf at plt>
>  8049624:	c7 04 24 ef 8c 05 08 	movl   $0x8058cef,(%esp)
>  804962b:	e8 98 fd ff ff       	call   80493c8 <atof at plt>
>  8049630:	db 7c 24 28          	fstpt  0x28(%esp)
>  8049634:	c7 04 24 f2 8c 05 08 	movl   $0x8058cf2,(%esp)
>  804963b:	e8 88 fd ff ff       	call   80493c8 <atof at plt>
>  8049640:	db 7c 24 1c          	fstpt  0x1c(%esp)
>  8049644:	c7 04 24 f5 8c 05 08 	movl   $0x8058cf5,(%esp)
>  804964b:	e8 78 fd ff ff       	call   80493c8 <atof at plt>
>  8049650:	db 6c 24 28          	fldt   0x28(%esp)
>  8049654:	dd 5c 24 10          	fstpl  0x10(%esp)
>  8049658:	db 6c 24 1c          	fldt   0x1c(%esp)
>  804965c:	dd 5c 24 08          	fstpl  0x8(%esp)
>  8049660:	dd 1c 24             	fstpl  (%esp)
>  8049663:	e8 58 00 00 00       	call   80496c0 <_D2mo4min3FdddZd>
>  8049668:	83 ec 18             	sub    $0x18,%esp
>  804966b:	dd 5c 24 04          	fstpl  0x4(%esp)
>  804966f:	c7 04 24 f8 8c 05 08 	movl   $0x8058cf8,(%esp)
>  8049676:	e8 0d fd ff ff       	call   8049388 <printf at plt>
>  804967b:	31 c0                	xor    %eax,%eax
>  804967d:	83 c4 34             	add    $0x34,%esp
>  8049680:	c2 08 00             	ret    $0x8
>  8049683:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
>  8049689:	8d bc 27 00 00 00 00 	lea    0x0(%edi,%eiz,1),%edi
> 
> -----------------
> 
> dmd -O -release -inline temp.d mo.d
> 
> __Dmain comdat
> L0:     sub ESP,038h
>         mov EAX,offset FLAT:_DATA
>         push    EBX
>         push    ESI
>         push    EDI
>         push    EAX
>         call    near ptr _atoi
>         add ESP,4
>         mov EBX,EAX
>         mov ECX,EAX
>         imul    ECX,ECX
>         mov EDX,offset FLAT:_DATA[4]
>         push    ECX
>         push    EDX
>         call    near ptr _printf
>         mov ESI,offset FLAT:_DATA[8]
>         push    ESI
>         call    near ptr _atof
>         mov EDI,offset FLAT:_DATA[0Ch]
>         fstp    qword ptr 018h[ESP]
>         push    EDI
>         call    near ptr _atof
>         mov EAX,offset FLAT:_DATA[010h]
>         fstp    qword ptr 024h[ESP]
>         push    EAX
>         call    near ptr _atof
>         add ESP,4
>         fld qword ptr 01Ch[ESP]
>         fxch    ST1
>         fstp    qword ptr 02Ch[ESP]
>         fcomp   qword ptr 024h[ESP]
>         fstsw   AX
>         sahf
>         ja  L83
>         jp  L83
>         fld qword ptr 01Ch[ESP]
>         fcomp   qword ptr 02Ch[ESP]
>         fstsw   AX
>         sahf
>         ja  L7D
>         jp  L7D
>         fld qword ptr 01Ch[ESP]
>         jmp short   L9C
> L7D:        fld qword ptr 02Ch[ESP]
>         jmp short   L9C
> L83:        fld qword ptr 024h[ESP]
>         fcomp   qword ptr 02Ch[ESP]
>         fstsw   AX
>         sahf
>         ja  L98
>         jp  L98
>         fld qword ptr 024h[ESP]
>         jmp short   L9C
> L98:        fld qword ptr 02Ch[ESP]
> L9C:        sub ESP,8
>         mov ECX,offset FLAT:_DATA[014h]
>         fstp    qword ptr [ESP]
>         push    ECX
>         call    near ptr _printf
>         add ESP,01Ch
>         xor EAX,EAX
>         pop EDI
>         pop ESI
>         pop EBX
>         add ESP,038h
>         ret
> 
> -----------------
> 
> Using Link-Time optimization LDC is able to inline those functions.
> So here it seems LDC is worse :-(


I have to try it with gdc too.


> 
> Bye,
> bearophile


Thank you very much for your research bearophile. It's very appreciated.

But now the interesting question is why the different compilers inline
functions so differently (other versions of the frontend ? has Walther
changed something) or is because they use different backends (which
should not matter so much since inlining normally is best done in the
frontend).

And of course Trass3rs original question under which conditions are
functions inlined still remains.

Are setters/getters inlined ? Overloaded operators ? Short helper
functions ? Functions with ref or out parameters ? In which cases does
it simply not work when it should ?


More information about the Digitalmars-d-learn mailing list