Modern C++ Lamentations
Dukc
ajieskola at gmail.com
Tue Mar 5 21:13:38 UTC 2019
On Friday, 4 January 2019 at 16:21:40 UTC, Steven Schveighoffer
wrote:
> On 1/4/19 9:00 AM, Dukc wrote:
>>>
>>
>> Isn't the main problem with performance of the Timon's range
>> loop that it uses arbitrary-sized integers (BigInts)?
>
> Atila's version of that code doesn't use bigints:
> https://github.com/atilaneves/pythagoras/blob/master/range.d#L24
>
> The major problem with the D range implementation is that the
> compiler isn't able to find the optimization of hoisting the
> multiplication of the outer indexes out of the inner loop.
>
> See my responses to Atila in this thread.
>
> -Steve
Now, I'm replying to an old theard, I hope you're still
interested enough to warrant necrobumping. The thing is, I did do
some additional testing of the range version, and I think I found
out a way to make the compiler to find the quoted optimization
without doing it manually.
You just have to move the sqr calculations to where the data is
still nested:
import std.experimental.all;
import std.datetime.stopwatch : AutoStart, StopWatch;
alias then(alias a)=(r)=>map!a(r).joiner;
void main(){
auto sw = StopWatch(AutoStart.no);
int total;
if (true)
{ sw.start;
scope (success) sw.stop;
auto triples=recurrence!"a[n-1]+1"(1L)
.then!(z=>iota(1,z+1).then!(x=>iota(x,z+1).map!(y=>tuple(x,y,z))))
.filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2)
.until!(t=>t[2] >= 500);
triples.each!((x,y,z){ total += x+y+z; });
}
writefln("Old loop time is %s microseconds",
sw.peek.total!"usecs"); // 118_614
sw.reset;
if (true)
{ sw.start;
scope (success) sw.stop;
auto triples=recurrence!"a[n-1]+1"(1L)
.then!(z=>iota(1,z+1).then!
( x=>iota(x,z+1)
.map!(y=> tuple(x,y,z))
.filter!((t)=>t[0]^^2+t[1]^^2==t[2]^^2))
)
.until!(t=>t[2] >= 500);
triples.each!((x,y,z){ total += x+y+z; });
}
writefln("New loop time is %s microseconds",
sw.peek.total!"usecs"); // 21_936
writeln(total); // to force the compiler to do the
calculations
return;
}
See, no manual caching of the squares. And the improvement is
over 5X (dub --build=release --compiler=ldc2), which should bring
it close to the other versions in the blog entry.
More information about the Digitalmars-d
mailing list