dmd simple loop disassembly - redundant instruction?
Ivan Kazmenko
gassa at mail.ru
Wed Dec 25 04:03:07 PST 2013
Hello,
I am studying the difference between x86 generated code of DMD
and C/C++ compilers on Windows (simply put: why exactly, and by
what margin, DMD-compiled D code is often slower than
GCC-compiled C/C++ equivalent).
Now, I have this simple D program:
-----
immutable int MAX_N = 1_000_000;
void main () {
int [MAX_N] a;
foreach (i; 0..MAX_N)
a[i] = i;
}
-----
(I know there's iota in std.range, and it turns out to be even
slower - but that's a high level function, and I'm trying to
understand the lower-level details now.)
The assembly (dmd -O -release -inline -noboundscheck, then
obj2asm) has the following piece corresponding to the cycle:
-----
L2C: mov -03D0900h[EDX*4][EBP],EDX
mov ECX,EDX
inc EDX
cmp EDX,0F4240h
jb L2C
-----
Now, I am not exactly fluent in assembler, but the "mov ECX, EDX"
seems unnecessary. The ECX register is explicitly used three
times in the whole program, and it looks like this instruction
can at least be moved out of the loop, if not removed completely.
Is it indeed a bug, or there's some reason here? And if the
former, where do I report it - at http://d.puremagic.com/issues/,
as with the front-end?
I didn't try GDC or LDC since I didn't find a clear instruction
for using them under Win32. If there is one, please kindly point
me to it. I found a few explanations for GDC, but had a hard
time trying to figure out which is the most current one.
Note that the C++ version does the same with four instructions
instead of five, as D version is expected to be if we remove the
instruction in question. Indeed, it goes like (code inside the
loop):
-----
L3:
movl %eax, _a(,%eax,4)
addl $1, %eax
cmpl $1000000, %eax
jne L3
-----
The full assembly listings, and the source codes (D and C++), are
here:
http://acm.math.spbu.ru/~gassa/dlang/simple_loop/
I've tried a few other versions as well. Changing the loop to an
explicit "for (int i = 0; i < MAX_N; i++)" (a2.d) does not affect
the generated assembly. Making the array dynamic (a3.d) leads to
five instructions, all seemingly important. A __gshared static
array (a4.d) gives the same seemingly unneeded instruction but
with EAX instead of ECX:
-----
L2: mov _D2a41aG1000000i[EDX*4],EDX
mov EAX,EDX
inc EDX
cmp EDX,0F4240h
jb L2
-----
Ivan Kazmenko.
More information about the Digitalmars-d-learn
mailing list