Creeping Bloat in Phobos

Peter Alexander via Digitalmars-d digitalmars-d at puremagic.com
Sat Sep 27 14:59:17 PDT 2014


On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright 
wrote:
> From time to time, I take a break from bugs and enhancements 
> and just look at what some piece of code is actually doing. 
> Sometimes, I'm appalled.

Me too, and yes it can be appalling. It's pretty bad for even 
simple range chains, e.g.

import std.algorithm, std.stdio;
int main(string[] args) {
   return cast(int)args.map!("a.length").reduce!"a+b"();
}

Here's what LDC produces (with -O -inline -release -noboundscheck)

__Dmain:
0000000100001480	pushq	%r15
0000000100001482	pushq	%r14
0000000100001484	pushq	%rbx
0000000100001485	movq	%rsi, %rbx
0000000100001488	movq	%rdi, %r14
000000010000148b	callq	0x10006df10 ## symbol stub for: 
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
0000000100001490	xorb	$0x1, %al
0000000100001492	movzbl	%al, %r9d
0000000100001496	leaq	_.str12(%rip), %rdx ## literal pool for: 
"/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
000000010000149d	movq	0xcbd2c(%rip), %r8 ## literal pool symbol 
address: 
__D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T
00000001000014a4	movl	$0x2dd, %edi
00000001000014a9	movl	$0x3f, %esi
00000001000014ae	xorl	%ecx, %ecx
00000001000014b0	callq	0x10006e0a2 ## symbol stub for: 
__D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
00000001000014b5	movq	(%rbx), %r15
00000001000014b8	leaq	0x10(%rbx), %rsi
00000001000014bc	leaq	-0x1(%r14), %rdi
00000001000014c0	callq	0x10006df10 ## symbol stub for: 
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014c5	testb	$0x1, %al
00000001000014c7	jne	0x1000014fa
00000001000014c9	addq	$-0x2, %r14
00000001000014cd	addq	$0x20, %rbx
00000001000014d1	nopw	%cs:(%rax,%rax)
00000001000014e0	addq	-0x10(%rbx), %r15
00000001000014e4	movq	%r14, %rdi
00000001000014e7	movq	%rbx, %rsi
00000001000014ea	callq	0x10006df10 ## symbol stub for: 
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014ef	decq	%r14
00000001000014f2	addq	$0x10, %rbx
00000001000014f6	testb	$0x1, %al
00000001000014f8	je	0x1000014e0
00000001000014fa	movl	%r15d, %eax
00000001000014fd	popq	%rbx
00000001000014fe	popq	%r14
0000000100001500	popq	%r15
0000000100001502	ret

and for:

import std.algorithm, std.stdio;
int main(string[] args) {
   int r = 0;
   foreach (i; 0..args.length)
     r += args[i].length;
   return r;
}

__Dmain:
00000001000015c0	xorl	%eax, %eax
00000001000015c2	testq	%rdi, %rdi
00000001000015c5	je	0x1000015de
00000001000015c7	nopw	(%rax,%rax)
00000001000015d0	movl	%eax, %eax
00000001000015d2	addq	(%rsi), %rax
00000001000015d5	addq	$0x10, %rsi
00000001000015d9	decq	%rdi
00000001000015dc	jne	0x1000015d0
00000001000015de	ret

(and sorry, don't even bother looking at what dmd does...)

I'm not complaining about LDC here (although I'm surprised 
array.empty isn't inlined). The way ranges are formulated make 
them difficult to optimize. I think there's things we can do here 
in the library. Maybe I'll write up something about that at some 
point.

I think the takeaway here is that people should be aware of (a) 
what kind of instructions their code is generating, (b) what kind 
of instructions their code SHOULD be generating, and (c) what is 
practically possible for present-day compilers. Like you say, it 
helps to look at the assembled code once in a while to get a feel 
for this kind of thing. Modern compilers are good, but they 
aren't magic.



More information about the Digitalmars-d mailing list