Errors in TDPL
bearophile
bearophileHUGS at lycos.com
Tue Jun 22 05:18:08 PDT 2010
Andrei:
>I optimized things such that the commonly used path (many calls to empty, front, and popFront) is as fast as possible. The initial work will be amortized for most loops.<
New timings:
Running time, best of 3, seconds:
test1: 0.31
test1 opt: 0.07
test2: 0.31
test2 opt: 0.12
test3: 6.38
test3 opt: 0.52
test3: 3.78 (new iota)
test3 opt: 0.29 (new iota)
test4: 4.70
test4 opt: 1.25
test4: 2.03 (new iota)
test4 opt: 0.27 (new iota)
Compile times opt version, seconds:
test1: 0.05
test2: 0.05
test3: 0.28
test4: 0.23 (new iota)
test4: 0.29
test4: 0.24 (new iota)
(The difference in compile time between 0.29 and 0.23 is probably not significant. The difference between 0.05 and 0.23 is significant.)
In my opinion the non-opt runtime timings too are important, because you can't compile code every time in opt mode. The normal for/foreach gives some performance even with no optmized compilation.
It's good to optimize retro() so it can recognize the iota() argument and return just another iota().
>On my machine, test4 is still 2x slower than foreach_reverse in an optimized build.<
As you see on mine in an optimized build it's a bit slower than 2x than foreach_reverse and more than 4x slower than a normal for loop.
>The disassembly reveals that the compiler refuses to inline some calls, which is disappointing as their bodies are very simple.<
The code inside the loop here is very simple, it's just a variable (register) inc. In this case, in an optimized build GCC is able to optimize away the whole for loop. In my opinion a good D2 compiler has to to do the same with a foreach(retro(iota)) :-)
If I am not wrong this is the inner loop of test4 with the new iota in opt build:
L3F: mov ECX,018h[ESP]
inc EBX
add 010h[ESP],ECX
mov EDX,010h[ESP]
cmp EDX,014h[ESP]
jne L3F
It's not very good, but it's better than before. All those memory loads and stores are not necessary in this loop, they waste time.
There is a call to this function, but it's before the call:
_D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota comdat
L0: sub ESP,0Ch
mov ECX,offset FLAT:_D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota14__dgliteral607MFZAxa
push EBX
push EBP
mov EBP,020h[ESP]
push ESI
mov ESI,EAX
push EDI
mov EDI,020h[ESP]
test EDI,EDI
mov 014h[ESP],EAX
mov 018h[ESP],ECX
je L28
cmp EBP,024h[ESP]
jne L2C
L28: xor EDX,EDX
jmp short L31
L2C: mov EDX,1
L31: xor DL,1
je L5A
push dword ptr FLAT:_DATA[054h]
mov EDX,ECX
push dword ptr FLAT:_DATA[050h]
push 084Dh
mov EAX,020h[ESP]
mov EBX,020h[ESP]
call EDX
push EDX
push EAX
call near ptr _D3std9contracts7bailOutFAyaixAaZv
L5A: test EDI,EDI
mov [ESI],EBP
mov 8[ESI],EDI
jle L77
mov EDX,024h[ESP]
dec EDX
mov EAX,EDX
mov 4[ESI],EDX
sub EAX,EBP
cdq
idiv EDI
sub 4[ESI],EDX
jmp short L89
L77: mov ECX,024h[ESP]
inc ECX
mov EAX,ECX
mov 4[ESI],ECX
sub EAX,EBP
cdq
idiv EDI
add 4[ESI],EDX
L89: add 4[ESI],EDI
mov EAX,ESI
pop EDI
pop ESI
pop EBP
pop EBX
add ESP,0Ch
ret 0Ch
It's a lot of code, that's the iota constructor. The _D3std9contracts7bailOutFAyaixAaZv is probably the enforce or part of it.
Doubling the loop size (N), the running time becomes about double, so such call to the constructor is not taking a lot of time (because there are no loops in the constructor).
-------------------
I have seen that iota() contains:
this(N current, N pastLast, S step) {
enforce(step != 0 && current != pastLast);
Indeed this throws:
import std.range;
void main() {
int count;
foreach (x; iota(10, 10, 1))
count++;
}
While his Python code gives no output:
>>> for x in xrange(10, 10, 1): print x
...
>>>
>>> for x in xrange(20, 10, 1): print x
...
In my opinion an empty loop is a valid loop, just as this is valid both syntactically and logically:
for (int i = 10; i < 10; i++) {}
for (int i = 20; i < 10; i++) {}
So I suggest to replace that line in the iota() costructor with:
enforce(step != 0);
And then create an empty generator if pastLast <= current (and the step is 1).
Bye,
bearophile
More information about the Digitalmars-d
mailing list