First experience with Threads

Sat Oct 6 06:17:04 PDT 2012

  Just a little experience and perhaps some help on the subject. 
This is a partial repost from another forum too so. I've always 
saw how much threading was an annoyance trying to follow along 
(the API alone) but programming it is more annoying. I've never 
actually done multi-thread programming so this is a first for me.

  First the problem. Trying to load up a data structure (that's 
fairly big) can take a fair amount of time, but if the records 
and structures never need to touch eachother, there's no reason 
they cannot be handled on separate cores/threads (or that's my 
logic on it anyways).

  In order to try and use more cores, I've split off the loading 
and unpacking stages as separate. So first off within half a 
second the whole memory is filled with 80Mb of data and all the 
records are separated. Now that they are separated, they can all 
be unpacked by the different cores.

  Part of a problem is when the thread activates, just because you 
start a thread doesn't mean it runs right away (it will run when 
it's ready), an any data that still relies on it via a delegate 
becomes a violate pointer data (At least in VisualD) and that 
data may change. So...

[code]
   class Record {
     //and stuff
     void loadSubRecords();
   }
   Record[] recordList; //and stuff

   foreach(rec; recordList) {
     Thread th = new Thread( () {rec.loadSubRecords()} );
     th.start();
   }
[/code]

  Rec (and even ref rec) may change at any time (Worse is during 
it's update or before the thread starts). So if we go with to 
copying an index instead it does improve a bit. So long as the 
data is copied before the next foreach loop it's fine, otherwise 
I may still change and it may do something unwanted.

[code]
   foreach(i, rec; recordList) {
     Thread th = new Thread( ()
        {
          int index = i;
          recordList[index].loadSubRecords();
        });
     th.start();
   }
[/code]

Several other combinations came up. I think I found an easy way 
to handle it without adding in unneeded mutexes and whatnot. What 
seems to work is if I pack all the data for the job I need in a 
structure, and have that structure start the thread (inside), 
then the chances of the problem happening go away (hopefully 
completely).

[code]
   //or something similar
   struct Packed {
     Thread thread;
     Record record;
     void run() {
       assert(record);
       thread = new Thread( (){record.loadSubRecords();} );
       thread.start();
     }
   }

   //bad way of thread handling, but makes sense.
   Packed[] obj;
   obj.length = recordList.length;

   foreach(i, rec; recordList) {
     obj[i].record = rec; //class is reference type remember
     obj[i].run(); //returns right away, but thread is running too
   }
   threads_joinAll();
[/code]

  So long as the records (and subrecords) never touch eachother 
then mutexes and semephores aren't needed 90% of the time.

  Now since the record count in the original file is 40k, having 
40k of threads not only is dumb, but also expensive to set up. So 
instead I set up job groups.

[code]
   struct PackedList {
     Thread thread;
     Record[] recordList;

     void runWork() {
       foreach(rec; recordList)
         rec.loadSubRecords();
     }

     void run() {
       assert(recordList);
       thread = new Thread( (){this.runWork();} );
       thread.start();
     }
   }
[/code]

  With this basic idea, drop a thousand in one PackedList and 
start it, then grab another thousand and drop them into another 
PackedList. They'll run until their workload is done.

  Is there a suggested magic number of how many threads per core 
you should use? If you have say a quad core, you can have 4 
threads going (obviously) but if they go to sleep waiting on 
system resources or something (loading a file, saving, something 
other), then the core may be unused. It makes sense to have 2 per 
core since then if it gets silent it has another it can pick up 
on. I'm guessing 2-4 would be the number of threads to do this 
type of work.