Creeping Bloat in Phobos
Brad Roberts via Digitalmars-d
digitalmars-d at puremagic.com
Sat Sep 27 15:26:08 PDT 2014
What we're seeing here is pretty much the same problem that early c++
suffered from: abstraction penalty. It took years of work to help
overcome it, both from the compiler and the library. Not having trivial
functions inlined and optimized down through standard techniques like
dead store elimination, value range propagation, various loop
restructurings, etc means that code will look like what Walter and you
have shown. Given DMD's relatively weak inliner, I'm not shocked by
Walter's example. I am curious why ldc failed to inline those functions.
On 9/27/2014 2:59 PM, Peter Alexander via Digitalmars-d wrote:
> On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
>> From time to time, I take a break from bugs and enhancements and just
>> look at what some piece of code is actually doing. Sometimes, I'm
>> appalled.
>
> Me too, and yes it can be appalling. It's pretty bad for even simple
> range chains, e.g.
>
> import std.algorithm, std.stdio;
> int main(string[] args) {
> return cast(int)args.map!("a.length").reduce!"a+b"();
> }
>
> Here's what LDC produces (with -O -inline -release -noboundscheck)
>
> __Dmain:
> 0000000100001480 pushq %r15
> 0000000100001482 pushq %r14
> 0000000100001484 pushq %rbx
> 0000000100001485 movq %rsi, %rbx
> 0000000100001488 movq %rdi, %r14
> 000000010000148b callq 0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 0000000100001490 xorb $0x1, %al
> 0000000100001492 movzbl %al, %r9d
> 0000000100001496 leaq _.str12(%rip), %rdx ## literal pool for:
> "/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
> 000000010000149d movq 0xcbd2c(%rip), %r8 ## literal pool symbol
> address:
> __D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T
>
> 00000001000014a4 movl $0x2dd, %edi
> 00000001000014a9 movl $0x3f, %esi
> 00000001000014ae xorl %ecx, %ecx
> 00000001000014b0 callq 0x10006e0a2 ## symbol stub for:
> __D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
> 00000001000014b5 movq (%rbx), %r15
> 00000001000014b8 leaq 0x10(%rbx), %rsi
> 00000001000014bc leaq -0x1(%r14), %rdi
> 00000001000014c0 callq 0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 00000001000014c5 testb $0x1, %al
> 00000001000014c7 jne 0x1000014fa
> 00000001000014c9 addq $-0x2, %r14
> 00000001000014cd addq $0x20, %rbx
> 00000001000014d1 nopw %cs:(%rax,%rax)
> 00000001000014e0 addq -0x10(%rbx), %r15
> 00000001000014e4 movq %r14, %rdi
> 00000001000014e7 movq %rbx, %rsi
> 00000001000014ea callq 0x10006df10 ## symbol stub for:
> __D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
> 00000001000014ef decq %r14
> 00000001000014f2 addq $0x10, %rbx
> 00000001000014f6 testb $0x1, %al
> 00000001000014f8 je 0x1000014e0
> 00000001000014fa movl %r15d, %eax
> 00000001000014fd popq %rbx
> 00000001000014fe popq %r14
> 0000000100001500 popq %r15
> 0000000100001502 ret
>
> and for:
>
> import std.algorithm, std.stdio;
> int main(string[] args) {
> int r = 0;
> foreach (i; 0..args.length)
> r += args[i].length;
> return r;
> }
>
> __Dmain:
> 00000001000015c0 xorl %eax, %eax
> 00000001000015c2 testq %rdi, %rdi
> 00000001000015c5 je 0x1000015de
> 00000001000015c7 nopw (%rax,%rax)
> 00000001000015d0 movl %eax, %eax
> 00000001000015d2 addq (%rsi), %rax
> 00000001000015d5 addq $0x10, %rsi
> 00000001000015d9 decq %rdi
> 00000001000015dc jne 0x1000015d0
> 00000001000015de ret
>
> (and sorry, don't even bother looking at what dmd does...)
>
> I'm not complaining about LDC here (although I'm surprised array.empty
> isn't inlined). The way ranges are formulated make them difficult to
> optimize. I think there's things we can do here in the library. Maybe
> I'll write up something about that at some point.
>
> I think the takeaway here is that people should be aware of (a) what
> kind of instructions their code is generating, (b) what kind of
> instructions their code SHOULD be generating, and (c) what is
> practically possible for present-day compilers. Like you say, it helps
> to look at the assembled code once in a while to get a feel for this
> kind of thing. Modern compilers are good, but they aren't magic.
More information about the Digitalmars-d
mailing list