rvalues -> ref (yup... again!)

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Mar 27 22:11:55 UTC 2018


On Tue, Mar 27, 2018 at 09:52:25PM +0000, Rubn via Digitalmars-d wrote:
> On Tuesday, 27 March 2018 at 20:38:35 UTC, H. S. Teoh wrote:
> > On Tue, Mar 27, 2018 at 08:25:36PM +0000, Rubn via Digitalmars-d wrote:
> > [...]
> > > _D7example__T3fooTSQr3FooZQnFNbNiNfQrZv:
> > >   push rbp
> > >   mov rbp, rsp
> > >   sub rsp, 3104
> > >   lea rax, [rbp + 16]
> > >   lea rdi, [rbp - 2048]
> > >   lea rcx, [rbp - 1024]
> > >   mov edx, 1024
> > >   mov rsi, rcx
> > >   mov qword ptr [rbp - 2056], rdi
> > >   mov rdi, rsi
> > >   mov rsi, rax
> > >   mov qword ptr [rbp - 2064], rcx
> > >   call memcpy at PLT    <--------------------- hidden copy
> > [...]
> > 
> > Is this generated by dmd, or gdc/ldc?
> > 
> > Generally, when it comes to performance issues, I don't even bother
> > looking at dmd-generated code anymore.  If the extra copying is
> > still happening with gdc -O2 / ldc -O, then you have a point.
> > Otherwise, it doesn't really say very much.
> > 
> > 
> > T
> 
> It happens with LDC too, not sure how it would be able to know to do
> any kind of optimization like that unless it was able to inline every
> single function called into one function and be able to do optimize it
> from there.  I don't imagine that'll be likely though.

You'll be surprised.  Don't underestimate the power of modern
optimizers.  I've seen LDC do inlining that's so aggressive, that it
essentially evaluated an entire series of function calls at compile-time
(likely on the IR) and generated a single instruction to load the answer
into the return register at runtime. :-D  Of course, it still generated
the individual functions, but those are never actually called at
runtime.

(On one occasion, this produced odd-looking "benchmark" results where the
ldc executable computed the answer in exactly 0ms, whereas everyone else
took a lot longer than that. :-D  (Well, it was probably a few nanosecs
while the CPU decoded and ran the instruction, but I don't think any
benchmark could measure that!))

For your code example, you might want to look at the code generated for
callers of the function, since when compiling individual functions in
isolation, LDC is obligated to follow the ABI, which could include
redundant copying. But if inlining was possible, it could generate very
different code.


T

-- 
Dogs have owners ... cats have staff. -- Krista Casada


More information about the Digitalmars-d mailing list