std.parallelism equivalents for posix fork and multi-machine processing
Laeeth Isharc via Digitalmars-d
digitalmars-d at puremagic.com
Thu May 14 13:03:53 PDT 2015
On Thursday, 14 May 2015 at 16:33:46 UTC, John Colvin wrote:
> On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:
>> On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
>>> Is there value to having equivalents to the std.parallelism
>>> approach that works with processes rather than threads, and
>>> makes it easy to manage tasks over multiple machines?
>>
>> I'm not sure if you're asking because of this thread, but see
>>
>> http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org
>>
>> python outperforming D because it doesn't have to deal with
>> synchronization headaches. I found D to be way faster when
>> reimplemented with fork, but having to use the stdc API is
>> ugly(IMO)
>
> It was also easy to get D very fast by just being a little more
> eager with IO and reducing the enormous number of little
> allocations being made.
Yes - thank you for your highly educational rewrite, which I
personally very much appreciate your taking the trouble to do.
Perhaps this should be turned (by you or someone else) into a
mini case-study on the wiki of how to write idiomatic and
efficient D code. Or maybe just put up the slides from your
forthcoming talk (which I look forward to watching later when it
is up).
It's good to know D can in fact deliver on the implicit promise
in a real use case with not too much work. (Yes, naively written
code was a bit slow when dealing with millions of lines, but in
which language of comparable flexibility would that not be true).
It's also interesting that your code was idiomatic. (I was
reading up about Scala, which seems beautiful in many ways, but
it is terribly disturbing to see that the idiomatic way often
seems to be the most inefficient, at least as things stood a
couple of years ago).
But, even so, I think having a wrapper for fork and an API for
multiprocessing (which you could then hook up to eg the Digital
Ocean, AWS apis etc) would be rather helpful.
I spoke with a friend of mine at one of the most admired/hated
Wall Street firms. One of the smartest quants I know who has now
moved to portfolio management. He was doing a study on tick data
going back to 2000. I asked him how long it took to run on his
firm's infrastructure. An hour! And the operations were pretty
simple. I think it should only take a couple of minutes. And it
would be nice to show an example of - from a spreadsheet -
spinning up 100 digital ocean instances - and running the numbers
not just on one security, but every relevant security, and having
a nice summary appear back in the sheet within a couple of
minutes.
The reason speed matters is that long waits interfere with rapid
iteration and the creative thought process. In a market
environment you may well have forgotten what you wanted after an
hour...
Laeeth.
More information about the Digitalmars-d
mailing list