A Friendly Challenge for D

Wed Oct 10 22:05:23 UTC 2018

On Wednesday, 10 October 2018 at 20:43:01 UTC, Kagamin wrote:
> On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
> wrote:
>> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
>
> As i understand, main thread preallocates global memory and 
> tracks it, and other threads don't track it?

Here's an abreviated elementary explanation of the algorithm and 
implementation.
You will need to read (download) my paper "The Segmented Sieve of 
Zakiya (SSoZ)"
because I will make reference to things in it, to keep this as 
short as possible.

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

To really understand the math and theory of the algorithm 
requires primarily
just understanding Table 3 on page 3 of the paper, as it 
encapsulates everything.
You can read the paper to understand the language of the 
algorithm used in the code.

Twinprimes are 2 (odd) primes that differ by only 2 eg. 3-5, 5-7, 
11-13, 29-31.

Table 3 shows all the residues values (prime candidates|pcs) and 
residues groups
(resgroups|columns) to find all the primes upto 541 using P5 
(modpg = 30, rescnt = 8).

For a given value N, it will be represented with a PG table of 
some number of
resgroups, with max size I call Kmax (the regroups value N 
residues in).

Using P5, I only need to sieve primes along the residue tracks 
(restracks) that can
produce twinprimes, here 11-13, 17-19, 29-31. Thus I create 3 
byte arrays, one for each twinpair, and use the lower 2 bits to 
represent the upper and lower twinprime restracks.

Then for each twinpair (here 3) I run 3 thread which perform the 
SSoZ on the entire Kmax length of resgroups in parallel. At the 
end I accumulate the results and print them out. This, in a 
nutshell, is what the algorithm does. The paper gives you enough 
to understand the fundamental nature of the algorithm, though 
I've learned so much more than in 2014. :)

The larger the PG the more twinpair restracks (see page 14) there 
are to use. For larger numbers you want to use the largest PG 
possible that 'fits' into the hardware cpu. All my 
development|testing was done on laptops using Intel I5|I7 cpus 
with 4 or 8 threads. I'm really interested how it performs on 
other platforms (ARM, AMD, PowerPC, etc).

The main proc "twinprimes_ssoz" manages program flow and set as 
follows:

1) accepts the input number (an integer) in "val" from the cli

2) sets "num" to be first odd number < "val" if "val" even

3) calls "selectPG" with "num" to select optimum Prime Generator 
(PG) parameters

4) compute various working parameters per selected PG and number 
value (see refs)

5) compute global values for number of primes, and their values, 
<= sqrt(num) for
selected PG with proc "sozpg"

6) Set initial twinprime count for selected PG

7) Then with proc "segsieve" allocate one thread to perform SSoZ 
(segmented sieve of zakiya)
on each twinprime pair residues (determined by selected PG), and 
count number of twinprmes
computed in each thread.

8) Then determine true total twinprime count and last twinprime 
<= "num"

It also is timing different intervals and prints out diagnostics 
and final output.

The proc "twins_sieve" is the primary function that manages and 
performs the SSoZ for
a given twinprim pair parameters in each thread. The more threads 
the faster the process goes.

I'll provide more info later. I have to run now. I wanted to get 
this out now while I
was at my laptop, and online.