problem with parallel foreach
John Colvin via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed May 13 06:40:32 PDT 2015
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
> On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
>> On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
>> wrote:
>>> On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
>>>> At the risk of great embarassment ... here's my program:
>>>> http://dekoppel.eu/tmp/pedupg.d
>>>
>>> Would it be possible to give us some example data?
>>> I might give it a go to try rewriting it tomorrow.
>>
>> http://dekoppel.eu/tmp/pedupgLarge.tar.gz (89 Mb)
>>
>> Contains two largish datasets in a directory structure
>> expected by the program.
>
> I only see 2 traits in that example, so it's hard for anyone to
> explore your scaling problem, seeing as there are a maximum of
> 2 tasks.
Either way, a few small changes were enough to cut the runtime by
a factor of ~6 in the single-threaded case and improve the
scaling a bit, although the printing to output files still looks
like a bit of a bottleneck.
http://dpaste.dzfl.pl/80cd36fd6796
The key thing was reducing the number of allocations (more
std.algorithm.splitter copying to static arrays, less
std.array.split) and avoiding File.byLine. Other people in this
thread have mentioned alternatives to it that may be faster/have
lower memory usage, I just read the whole files in to memory and
then lazily split them with std.algorithm.splitter. I ended up
with some blank lines coming through, so i added if(line.empty)
continue; in a few places, you might want to look more carefully
at that, it could be my mistake.
The use of std.array.appender for `info` is just good practice,
but it doesn't make much difference here.
More information about the Digitalmars-d-learn
mailing list