Force inline

Moritz Maxeiner via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Feb 20 05:40:41 PST 2017


On Monday, 20 February 2017 at 12:47:43 UTC, berni wrote:
> pragma(inline, true) doesn't work out well:
>
>>int bar;
>>
>>void main(string[] args)
>>{
>>    if (foo()) {}
>>}
>> 
>>bool foo()
>>{
>>    pragma(inline, true)
>>
>>    if (bar==1) return false;
>>    if (bar==2) return false;
>>
>>    return true;
>>}
>
> with
>
>> dmd -inline test.d
>
> I get
>
>> test.d(8): Error: function test.foo cannot inline function

Because dmd's semantic analysis determined that it doesn't know 
how to inline the function and since you insisted that it must be 
inlined, you received an error. This is an issue with dmd. ldc2 
happily inlines your function:

---
$ ldc2 --version
LDC - the LLVM D compiler (1.1.0):
   based on DMD v2.071.2 and LLVM 3.9.1
   built with DMD64 D Compiler v2.072.2
   Default target: x86_64-pc-linux-gnu
$ ldc2 -c test.d
$ objdump -dr test.o
test.o:     file format elf64-x86-64


Disassembly of section .text._Dmain:

0000000000000000 <_Dmain>:
    0:	53                   	push   %rbx
    1:	48 83 ec 20          	sub    $0x20,%rsp
    5:	48 89 7c 24 10       	mov    %rdi,0x10(%rsp)
    a:	48 89 74 24 18       	mov    %rsi,0x18(%rsp)
    f:	66 48 8d 3d 00 00 00 	data16 lea 0x0(%rip),%rdi        # 17 
<_Dmain+0x17>
   16:	00
			13: R_X86_64_TLSGD	_D4test3bari-0x4
   17:	66 66 48 e8 00 00 00 	data16 data16 callq 1f <_Dmain+0x1f>
   1e:	00
			1b: R_X86_64_PLT32	__tls_get_addr-0x4
   1f:	8b 18                	mov    (%rax),%ebx
   21:	83 fb 01             	cmp    $0x1,%ebx
   24:	75 0a                	jne    30 <_Dmain+0x30>
   26:	31 c0                	xor    %eax,%eax
   28:	88 c1                	mov    %al,%cl
   2a:	88 4c 24 0f          	mov    %cl,0xf(%rsp)
   2e:	eb 29                	jmp    59 <_Dmain+0x59>
   30:	66 48 8d 3d 00 00 00 	data16 lea 0x0(%rip),%rdi        # 38 
<_Dmain+0x38>
   37:	00
			34: R_X86_64_TLSGD	_D4test3bari-0x4
   38:	66 66 48 e8 00 00 00 	data16 data16 callq 40 <_Dmain+0x40>
   3f:	00
			3c: R_X86_64_PLT32	__tls_get_addr-0x4
   40:	8b 18                	mov    (%rax),%ebx
   42:	83 fb 02             	cmp    $0x2,%ebx
   45:	75 0a                	jne    51 <_Dmain+0x51>
   47:	31 c0                	xor    %eax,%eax
   49:	88 c1                	mov    %al,%cl
   4b:	88 4c 24 0f          	mov    %cl,0xf(%rsp)
   4f:	eb 08                	jmp    59 <_Dmain+0x59>
   51:	b0 01                	mov    $0x1,%al
   53:	88 44 24 0f          	mov    %al,0xf(%rsp)
   57:	eb 00                	jmp    59 <_Dmain+0x59>
   59:	8a 44 24 0f          	mov    0xf(%rsp),%al
   5d:	a8 01                	test   $0x1,%al
   5f:	75 02                	jne    63 <_Dmain+0x63>
   61:	eb 02                	jmp    65 <_Dmain+0x65>
   63:	eb 00                	jmp    65 <_Dmain+0x65>
   65:	31 c0                	xor    %eax,%eax
   67:	48 83 c4 20          	add    $0x20,%rsp
   6b:	5b                   	pop    %rbx
   6c:	c3                   	retq
---

>
> When I remove -inline, it compiles, but seems not to inline. I 
> cannot tell from this small example, but with the large 
> program, there is no speed gain.

I'd suggest inspecting the generated assembly in order to 
determine whether your function was inlined or not (see above 
using objdump for Linux).

>
> It also compiles with -inline when I remove the "if 
> (bar==2)...". I guess, it's now really inlining, but the 
> function is ridiculously short...

I don't know, but I'd guess that the length of a function is not 
as important for the consideration of being inlined as its 
semantics.


More information about the Digitalmars-d-learn mailing list