A Friendly Challenge for D

Mon Oct 15 19:51:04 UTC 2018

On Sunday, 14 October 2018 at 10:51:11 UTC, Vijay Nayar wrote:
> Once I get the bugs out, I'm curious to see if any performance 
> differences crop up.  There's the theory that says they should 
> be the same, and then there's the practice.

I don't actually understand the underlying algorithm, but I at 
least understand the flow of the program and the structure.  The 
algorithm utilized depends heavily on using shared memory access, 
which can be done in D, but I definitely wouldn't call it 
idiomatic.  In D, message passing is preferred, but it really 
can't be well demonstrated on your algorithm without a deeper 
understanding of the algorithm itself.

A complete working version can be found at: 
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7

I modified both versions of the program to utilize the 
pgParameters13 for more of an apples-to-apples comparison.

The final results are as follows:
$ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim && 
echo "3000000000" | ./twinprimes_ssoz
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.222 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 2999999712+/-1
total time = 0.223 secs

$ dub build --compiler=ldc2 -b=release && echo "3000000000" | 
./twinprimes
Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 2999999712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is largely 
due to slightly better optimization from the LLVM based ldc2 
compiler, where I believe Nim is using gcc.