std.parallelism equivalents for posix fork and multi-machine processing

Thu May 14 13:03:53 PDT 2015

On Thursday, 14 May 2015 at 16:33:46 UTC, John Colvin wrote:
> On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:
>> On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
>>> Is there value to having equivalents to the std.parallelism 
>>> approach that works with processes rather than threads, and 
>>> makes it easy to manage tasks over multiple machines?
>>
>> I'm not sure if you're asking because of this thread, but see
>>
>> http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org
>>
>> python outperforming D because it doesn't have to deal with 
>> synchronization headaches. I found D to be way faster when 
>> reimplemented with fork, but having to use the stdc API is 
>> ugly(IMO)
>
> It was also easy to get D very fast by just being a little more 
> eager with IO and reducing the enormous number of little 
> allocations being made.

Yes - thank you for your highly educational rewrite, which I 
personally very much appreciate your taking the trouble to do.  
Perhaps this should be turned (by you or someone else) into a 
mini case-study on the wiki of how to write idiomatic and 
efficient D code.  Or maybe just put up the slides from your 
forthcoming talk (which I look forward to watching later when it 
is up).

It's good to know D can in fact deliver on the implicit promise 
in a real use case with not too much work.  (Yes, naively written 
code was a bit slow when dealing with millions of lines, but in 
which language of comparable flexibility would that not be true). 
  It's also interesting that your code was idiomatic.  (I was 
reading up about Scala, which seems beautiful in many ways, but 
it is terribly disturbing to see that the idiomatic way often 
seems to be the most inefficient, at least as things stood a 
couple of years ago).

But, even so, I think having a wrapper for fork and an API for 
multiprocessing (which you could then hook up to eg the Digital 
Ocean, AWS apis etc) would be rather helpful.

I spoke with a friend of mine at one of the most admired/hated 
Wall Street firms.  One of the smartest quants I know who has now 
moved to portfolio management.  He was doing a study on tick data 
going back to 2000.  I asked him how long it took to run on his 
firm's infrastructure.  An hour!  And the operations were pretty 
simple.  I think it should only take a couple of minutes.  And it 
would be nice to show an example of - from a spreadsheet - 
spinning up 100 digital ocean instances - and running the numbers 
not just on one security, but every relevant security, and having 
a nice summary appear back in the sheet within a couple of 
minutes.

The reason speed matters is that long waits interfere with rapid 
iteration and the creative thought process.  In a market 
environment you may well have forgotten what you wanted after an 
hour...

Laeeth.