A Friendly Challenge for D

Jabari Zakiya jzakiya at gmail.com
Mon Oct 15 20:49:26 UTC 2018


> I don't actually understand the underlying algorithm, but I at 
> least understand the flow of the program and the structure.  
> The algorithm utilized depends heavily on using shared memory 
> access, which can be done in D, but I definitely wouldn't call 
> it idiomatic.  In D, message passing is preferred, but it 
> really can't be well demonstrated on your algorithm without a 
> deeper understanding of the algorithm itself.
>
> A complete working version can be found at: 
> https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7
>
> I modified both versions of the program to utilize the 
> pgParameters13 for more of an apples-to-apples comparison.
>
> The final results are as follows:
> $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim 
> && echo "3000000000" | ./twinprimes_ssoz
> Enter integer number: threads = 8
> each thread segment is [1 x 65536] bytes array
> twinprime candidates = 175324676; resgroups = 1298702
> each 135 threads has nextp[2 x 5566] array
> setup time = 0.000 secs
> perform twinprimes ssoz sieve
> sieve time = 0.222 secs
> last segment = 53518 resgroups; segment slices = 20
> total twins = 9210144; last twin = 2999999712+/-1
> total time = 0.223 secs
>
> $ dub build --compiler=ldc2 -b=release && echo "3000000000" | 
> ./twinprimes
> Enter integer number:
> threads = 8
> each thread segment is [1 x 65536] bytes array
> twinprime candidates = 175324676; resgroups = 1298702
> each 135 threads has nextp[2 x 5566] array
> setup time = 1 ms, 864 μs, and 7 hnsecs
> perform twinprimes ssoz sieve
> sieve time = 196 ms, 566 μs, and 5 hnsecs
> last segment = 53518 resgroups; segment slices = 20
> total twins = 9210144; last twin = 2999999712+/- 1
> total time = 198 ms, 431 μs, and 2 hnsecs
>
> My understanding is that the difference in performance is 
> largely due to slightly better optimization from the LLVM based 
> ldc2 compiler, where I believe Nim is using gcc.

Hey Great!  I'll take some time and study your code.

Nim currently allows using gcc or clang: --cc:gcc or --cc:clang, 
and
compiles to C: nim c --cc:gcc... or C++: nim c++ --cc:gcc...

You can compile in Nim with/out using garbage collection (GC) and 
choose GC.
This version needs GC to recover mem created in each thread, or 
it will eat
it up. See operation of program using htop/top to see threads|mem 
at work.

If D has a fast bitarray structure, try it to create the data 
arrays
"prms" in proc "sozpg" and "seg" arrays is "twin_sieves". This 
could allow
for larger segment sizes|arrays that fit into cpu cache, reducing 
the number
of segment slices to process. This would all have to be profiled 
and
encapsulated in "selectPG", which adaptively selects to best PG 
to use.

Here is a table of output data and performance times (in seconds).

The results were produced on a System76 laptop with an Intel I7 
6700HQ cpu,
2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I compiled 
the file
"twinprimes_ssoz.nim" with Nim versions 0.18.0, and the just 
released 0.19.0,
using gcc 8.2.1 with Manjaro (Linux) in Virtual Box. I then ran 
the executables
on my host Linux system (which only has gcc-4.9.8) to use all 8 
threads.

The output was produced on a "quiet" system, where I turned off 
the laptop and
rebooted, and ran tests with only a terminal and spreedsheet open 
(to record
results). The times are the "best" times from multiple runs.

I also compare times from the program "primesieve" - 
https://primesieve.org/
which states on its site:

------------------------------------------------------------------------------------
primesieve is a free software program and C/C++ library that 
generates primes using a highly optimized sieve of Eratosthenes 
implementation. It counts the primes below 10^10 in just 0.45 
seconds on an Intel Core i7-6700 CPU (4 x 3.4 GHz). primesieve 
can generate primes and prime k‑tuplets up to 2^64.
------------------------------------------------------------------------------------

(An optimized C/C++ versions of twinprimes_ssoz would be nice to 
compare against too.)


   Number  | Twimprimes | Last Twinprime |twinprime_ssoz.nim, 
gcc-8.2.1| primesieve |
           |            |                |     0.18.0   |    
0.19.0    |   9.7.1    |
----------|------------|----------------|--------------|--------------|------------|
1 x 10^09 |    3424506 |     1999999872 |      0.043   |      
0.041   |     0.049  |
----------|------------|----------------|--------------|--------------|------------|
5 x 10^09 |   14618166 |     4999999860 |      0.219   |      
0.226   |     0.248  |
----------|------------|----------------|--------------|--------------|------------|
1 x 10^10 |   27412679 |     9999999702 |      0.398   |      
0.401   |     0.533  |
----------|------------|----------------|--------------|--------------|------------|
5 x 10^10 |  118903682 |    49999999590 |      2.364   |      
2.331   |     2.978  |
----------|------------|----------------|--------------|--------------|------------|
1 x 10^11 |  224376048 |    99999999762 |      4.586   |      
4.476   |     6.189  |
----------|------------|----------------|--------------|--------------|------------|
5 x 10^11 |  986222314 |   499999999062 |     26.625   |     
26.578   |    35.120  |
----------|------------|----------------|--------------|--------------|------------|
1 x 10^12 | 1870585220 |   999999999960 |     57.895   |     
58.195   |    73.966  |
----------|------------|----------------|--------------|--------------|------------|
5 x 10^12 | 8312493003 |  4999999999878 |    385.562   |    
389.347   |   433.320  |
----------|------------|----------------|--------------|--------------|------------|
1 x 10^13 |15834664872 |  9999999998490 |    855.705   |    
857.771   |   934.253  |
----------|------------|----------------|--------------|--------------|------------|
2 x 10^13 |30198862775 | 19999999999830 |   1806.282   |   
1818.766   |  1998.341  |
----------|------------|----------------|--------------|--------------|------------|

A typical output of the "twinprimes_ssoz.nim" files looks as 
below"

$ echo 123_345_789_000 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8
each thread segment is [1 x 98304] bytes array          // 98,304 
segment size (Kn)
twinprime candidates = 6099517038; resgroups = 4107419  // max 
twins; max cols (KB)
each 1485 threads has nextp[2 x 30062] array            // 
30,062, primes <= sqrt(N)
setup time = 0.006 secs                                 // time 
for primes|ssoz setup
perform twinprimes ssoz sieve                           // ssoz 
starts now
sieve time = 5.771 secs                                 // time 
to do just the ssoz
last segment = 76955 resgroups; segment slices = 42     // last 
seg cols; seg loops
total twins = 272018910; last twin = 123345788640+/-1   // 
twinprimes, val for last
total time = 5.778 secs                                 // setup 
+ ssoz time

You can determine the selected PG by the number of threads being 
used.

I'm writing up the paper, and will probably release the program 
operation description
to provide a detailed understanding of how|why it works, which 
will allow for modification.


More information about the Digitalmars-d mailing list