Errors in TDPL
bearophile
bearophileHUGS at lycos.com
Mon Jun 21 14:06:52 PDT 2010
Adam Ruppe:
> foreach(i; retro(iota(0, 10))) { }
Oh, right! Or even just:
foreach (i; retro(iota(10))) {}
But abstraction has a cost, see below. I have written three test programs.
----------------
// test1
import std.c.stdio: printf;
void main() {
enum int N = 100_000_000;
int count;
for (int i; i < N; i++)
count++;
printf("%d\n", count);
}
Asm of the inner code of the test1, opt version:
5: inc ECX
inc EAX
cmp EAX,05F5E100h
jb L5
----------------
// test2
import std.c.stdio: printf;
void main() {
enum int N = 100_000_000;
int count;
foreach_reverse (i; 0 .. N)
count++;
printf("%d\n", count);
}
Asm of the inner code of the test2, opt version:
LF: dec ECX
inc EDX
mov EAX,ECX
inc EAX
jg LF
----------------
// test3
import std.c.stdio: printf;
import std.range: iota, retro;
void main() {
enum int N = 100_000_000;
int count;
foreach (i; retro(iota(N)))
count++;
printf("%d\n", count);
}
Asm of the inner code of the test3, opt version:
L6A: inc EBX
lea EAX,010h[ESP]
call near ptr _D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612eD0677AACB4B6BC826B92E7DBB9E1E359
cmp dword ptr 020h[ESP],0
jne L6A
_D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612eD0677AACB4B6BC826B92E7DBB9E1E359 comdat
L0: sub ESP,01Ch
mov ECX,offset FLAT:_D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612e0B5C6D6E0C89B8D48DF56A414048DB6F
push EBX
push ESI
mov ESI,EAX
mov EDX,010h[ESI]
mov 0Ch[ESP],EAX
neg EDX
sbb EDX,EDX
neg EDX
xor DL,1
mov 010h[ESP],ECX
je L46
mov EDX,ECX
push dword ptr FLAT:_DATA[064h]
push dword ptr FLAT:_DATA[060h]
push 051Fh
mov EAX,018h[ESP]
mov EBX,018h[ESP]
call EDX
push EDX
push EAX
call near ptr _D3std9contracts7bailOutFAyaixAaZv
L46: dec dword ptr 010h[ESI]
pop ESI
pop EBX
add ESP,01Ch
ret
_D3std9contracts7bailOutFAyaixAaZv is extern.
------------------
Running time, best of 3, seconds:
test1: 0.31
test1 opt: 0.07
test2: 0.31
test2 opt: 0.12
test3: 6.38
test3 opt: 0.52
not opt version = dmd (no other option)
opt version = dmd -O -release -inline
Compile times opt version, seconds:
test1: 0.05
test2: 0.05
test3: 0.28
So with the current dmd compiler in the inner loop in a performance-critical routine you can't use foreach(retro(iota(N))).
In future a D compiler can recognize and optimize such usage of foreach(iota()) and foreach(retro(iota())) as ShedSkin does with usage of xrange()/range() in a loop.
Bye,
bearophile
More information about the Digitalmars-d
mailing list