Creeping Bloat in Phobos
Peter Alexander via Digitalmars-d
digitalmars-d at puremagic.com
Sat Sep 27 14:59:17 PDT 2014
On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright
wrote:
> From time to time, I take a break from bugs and enhancements
> and just look at what some piece of code is actually doing.
> Sometimes, I'm appalled.
Me too, and yes it can be appalling. It's pretty bad for even
simple range chains, e.g.
import std.algorithm, std.stdio;
int main(string[] args) {
return cast(int)args.map!("a.length").reduce!"a+b"();
}
Here's what LDC produces (with -O -inline -release -noboundscheck)
__Dmain:
0000000100001480 pushq %r15
0000000100001482 pushq %r14
0000000100001484 pushq %rbx
0000000100001485 movq %rsi, %rbx
0000000100001488 movq %rdi, %r14
000000010000148b callq 0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
0000000100001490 xorb $0x1, %al
0000000100001492 movzbl %al, %r9d
0000000100001496 leaq _.str12(%rip), %rdx ## literal pool for:
"/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
000000010000149d movq 0xcbd2c(%rip), %r8 ## literal pool symbol
address:
__D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T
00000001000014a4 movl $0x2dd, %edi
00000001000014a9 movl $0x3f, %esi
00000001000014ae xorl %ecx, %ecx
00000001000014b0 callq 0x10006e0a2 ## symbol stub for:
__D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
00000001000014b5 movq (%rbx), %r15
00000001000014b8 leaq 0x10(%rbx), %rsi
00000001000014bc leaq -0x1(%r14), %rdi
00000001000014c0 callq 0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014c5 testb $0x1, %al
00000001000014c7 jne 0x1000014fa
00000001000014c9 addq $-0x2, %r14
00000001000014cd addq $0x20, %rbx
00000001000014d1 nopw %cs:(%rax,%rax)
00000001000014e0 addq -0x10(%rbx), %r15
00000001000014e4 movq %r14, %rdi
00000001000014e7 movq %rbx, %rsi
00000001000014ea callq 0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014ef decq %r14
00000001000014f2 addq $0x10, %rbx
00000001000014f6 testb $0x1, %al
00000001000014f8 je 0x1000014e0
00000001000014fa movl %r15d, %eax
00000001000014fd popq %rbx
00000001000014fe popq %r14
0000000100001500 popq %r15
0000000100001502 ret
and for:
import std.algorithm, std.stdio;
int main(string[] args) {
int r = 0;
foreach (i; 0..args.length)
r += args[i].length;
return r;
}
__Dmain:
00000001000015c0 xorl %eax, %eax
00000001000015c2 testq %rdi, %rdi
00000001000015c5 je 0x1000015de
00000001000015c7 nopw (%rax,%rax)
00000001000015d0 movl %eax, %eax
00000001000015d2 addq (%rsi), %rax
00000001000015d5 addq $0x10, %rsi
00000001000015d9 decq %rdi
00000001000015dc jne 0x1000015d0
00000001000015de ret
(and sorry, don't even bother looking at what dmd does...)
I'm not complaining about LDC here (although I'm surprised
array.empty isn't inlined). The way ranges are formulated make
them difficult to optimize. I think there's things we can do here
in the library. Maybe I'll write up something about that at some
point.
I think the takeaway here is that people should be aware of (a)
what kind of instructions their code is generating, (b) what kind
of instructions their code SHOULD be generating, and (c) what is
practically possible for present-day compilers. Like you say, it
helps to look at the assembled code once in a while to get a feel
for this kind of thing. Modern compilers are good, but they
aren't magic.
More information about the Digitalmars-d
mailing list