Errors in TDPL

bearophile bearophileHUGS at lycos.com
Tue Jun 22 05:18:08 PDT 2010


Andrei:

>I optimized things such that the commonly used path (many calls to empty, front, and popFront) is as fast as possible. The initial work will be amortized for most loops.<

New timings:

Running time, best of 3, seconds:
  test1:     0.31
  test1 opt: 0.07
  test2:     0.31
  test2 opt: 0.12
  test3:     6.38
  test3 opt: 0.52
  test3:     3.78  (new iota)
  test3 opt: 0.29  (new iota)
  test4:     4.70
  test4 opt: 1.25
  test4:     2.03 (new iota)
  test4 opt: 0.27 (new iota)

Compile times opt version, seconds:
  test1: 0.05
  test2: 0.05
  test3: 0.28
  test4: 0.23 (new iota)
  test4: 0.29
  test4: 0.24 (new iota)

(The difference in compile time between 0.29 and 0.23 is probably not significant. The difference between 0.05 and 0.23 is significant.)

In my opinion the non-opt runtime timings too are important, because you can't compile code every time in opt mode. The normal for/foreach gives some performance even with no optmized compilation.

It's good to optimize retro() so it can recognize the iota() argument and return just another iota().


>On my machine, test4 is still 2x slower than foreach_reverse in an optimized build.<

As you see on mine in an optimized build it's a bit slower than 2x than foreach_reverse and more than 4x slower than a normal for loop.


>The disassembly reveals that the compiler refuses to inline some calls, which is disappointing as their bodies are very simple.<

The code inside the loop here is very simple, it's just a variable (register) inc. In this case, in an optimized build GCC is able to optimize away the whole for loop. In my opinion a good D2 compiler has to to do the same with a foreach(retro(iota)) :-)

If I am not wrong this is the inner loop of test4 with the new iota in opt build:

L3F:		mov	ECX,018h[ESP]
		inc	EBX
		add	010h[ESP],ECX
		mov	EDX,010h[ESP]
		cmp	EDX,014h[ESP]
		jne	L3F


It's not very good, but it's better than before. All those memory loads and stores are not necessary in this loop, they waste time.

There is a call to this function, but it's before the call:

_D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota	comdat
L0:		sub	ESP,0Ch
		mov	ECX,offset FLAT:_D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota14__dgliteral607MFZAxa
		push	EBX
		push	EBP
		mov	EBP,020h[ESP]
		push	ESI
		mov	ESI,EAX
		push	EDI
		mov	EDI,020h[ESP]
		test	EDI,EDI
		mov	014h[ESP],EAX
		mov	018h[ESP],ECX
		je	L28
		cmp	EBP,024h[ESP]
		jne	L2C
L28:		xor	EDX,EDX
		jmp short	L31
L2C:		mov	EDX,1
L31:		xor	DL,1
		je	L5A
		push	dword ptr FLAT:_DATA[054h]
		mov	EDX,ECX
		push	dword ptr FLAT:_DATA[050h]
		push	084Dh
		mov	EAX,020h[ESP]
		mov	EBX,020h[ESP]
		call	EDX
		push	EDX
		push	EAX
		call	near ptr _D3std9contracts7bailOutFAyaixAaZv
L5A:		test	EDI,EDI
		mov	[ESI],EBP
		mov	8[ESI],EDI
		jle	L77
		mov	EDX,024h[ESP]
		dec	EDX
		mov	EAX,EDX
		mov	4[ESI],EDX
		sub	EAX,EBP
		cdq
		idiv	EDI
		sub	4[ESI],EDX
		jmp short	L89
L77:		mov	ECX,024h[ESP]
		inc	ECX
		mov	EAX,ECX
		mov	4[ESI],ECX
		sub	EAX,EBP
		cdq
		idiv	EDI
		add	4[ESI],EDX
L89:		add	4[ESI],EDI
		mov	EAX,ESI
		pop	EDI
		pop	ESI
		pop	EBP
		pop	EBX
		add	ESP,0Ch
		ret	0Ch


It's a lot of code, that's the iota constructor. The _D3std9contracts7bailOutFAyaixAaZv is probably the enforce or part of it.

Doubling the loop size (N), the running time becomes about double, so such call to the constructor is not taking a lot of time (because there are no loops in the constructor).

-------------------

I have seen that iota() contains:

this(N current, N pastLast, S step) {
    enforce(step != 0 && current != pastLast);


Indeed this throws:

import std.range;
void main() {
    int count;
    foreach (x; iota(10, 10, 1))
        count++;
}


While his Python code gives no output:

>>> for x in xrange(10, 10, 1): print x
...
>>>

>>> for x in xrange(20, 10, 1): print x
...

In my opinion an empty loop is a valid loop, just as this is valid both syntactically and logically:

for (int i = 10; i < 10; i++) {}
for (int i = 20; i < 10; i++) {}

So I suggest to replace that line in the iota() costructor with:

enforce(step != 0);

And then create an empty generator if pastLast <= current (and the step is 1).

Bye,
bearophile


More information about the Digitalmars-d mailing list