dmd simple loop disassembly - redundant instruction?

Ivan Kazmenko gassa at
Wed Dec 25 04:03:07 PST 2013


I am studying the difference between x86 generated code of DMD 
and C/C++ compilers on Windows (simply put: why exactly, and by 
what margin, DMD-compiled D code is often slower than 
GCC-compiled C/C++ equivalent).

Now, I have this simple D program:

immutable int MAX_N = 1_000_000;
void main () {
     int [MAX_N] a;
     foreach (i; 0..MAX_N)
         a[i] = i;

(I know there's iota in std.range, and it turns out to be even 
slower - but that's a high level function, and I'm trying to 
understand the lower-level details now.)

The assembly (dmd -O -release -inline -noboundscheck, then 
obj2asm) has the following piece corresponding to the cycle:

L2C:		mov	-03D0900h[EDX*4][EBP],EDX
		mov	ECX,EDX
		inc	EDX
		cmp	EDX,0F4240h
		jb	L2C

Now, I am not exactly fluent in assembler, but the "mov ECX, EDX" 
seems unnecessary.  The ECX register is explicitly used three 
times in the whole program, and it looks like this instruction 
can at least be moved out of the loop, if not removed completely. 
  Is it indeed a bug, or there's some reason here?  And if the 
former, where do I report it - at, 
as with the front-end?

I didn't try GDC or LDC since I didn't find a clear instruction 
for using them under Win32.  If there is one, please kindly point 
me to it.  I found a few explanations for GDC, but had a hard 
time trying to figure out which is the most current one.

Note that the C++ version does the same with four instructions 
instead of five, as D version is expected to be if we remove the 
instruction in question.  Indeed, it goes like (code inside the 

	movl	%eax, _a(,%eax,4)
	addl	$1, %eax
	cmpl	$1000000, %eax
	jne	L3

The full assembly listings, and the source codes (D and C++), are 

I've tried a few other versions as well.  Changing the loop to an 
explicit "for (int i = 0; i < MAX_N; i++)" (a2.d) does not affect 
the generated assembly.  Making the array dynamic (a3.d) leads to 
five instructions, all seemingly important.  A __gshared static 
array (a4.d) gives the same seemingly unneeded instruction but 
with EAX instead of ECX:

L2:		mov	_D2a41aG1000000i[EDX*4],EDX
		mov	EAX,EDX
		inc	EDX
		cmp	EDX,0F4240h
		jb	L2

Ivan Kazmenko.

More information about the Digitalmars-d-learn mailing list