problem with parallel foreach

Wed May 13 07:28:51 PDT 2015

On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
> On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
>> On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
>>> On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole 
>>> wrote:
>>>> On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
>>>>> At the risk of great embarassment ... here's my program:
>>>>> http://dekoppel.eu/tmp/pedupg.d
>>>>
>>>> Would it be possible to give us some example data?
>>>> I might give it a go to try rewriting it tomorrow.
>>>
>>> http://dekoppel.eu/tmp/pedupgLarge.tar.gz (89 Mb)
>>>
>>> Contains two largish datasets in a directory structure 
>>> expected by the program.
>>
>> I only see 2 traits in that example, so it's hard for anyone 
>> to explore your scaling problem, seeing as there are a maximum 
>> of 2 tasks.
>
> Either way, a few small changes were enough to cut the runtime 
> by a factor of ~6 in the single-threaded case and improve the 
> scaling a bit, although the printing to output files still 
> looks like a bit of a bottleneck.
>

> http://dpaste.dzfl.pl/80cd36fd6796
>
> The key thing was reducing the number of allocations (more 
> std.algorithm.splitter copying to static arrays, less 
> std.array.split) and avoiding File.byLine. Other people in this 
> thread have mentioned alternatives to it that may be 
> faster/have lower memory usage, I just read the whole files in 
> to memory and then lazily split them with 
> std.algorithm.splitter. I ended up with some blank lines coming 
> through, so i added if(line.empty) continue; in a few places, 
> you might want to look more carefully at that, it could be my 
> mistake.
>
> The use of std.array.appender for `info` is just good practice, 
> but it doesn't make much difference here.

Wow, I'm impressed with the effort you guys (John, Rikki, others) 
are making to teach me some efficiency tricks. I guess this is 
one of the strengths of D: its community. I'm studying your 
various contributions closely!

The empty line comes from the very last line on the files, which 
also end with a newline (as per "normal" practice?).