problem with parallel foreach

Wed May 13 08:46:32 PDT 2015

On Wednesday, 13 May 2015 at 14:43:50 UTC, John Colvin wrote:
> On Wednesday, 13 May 2015 at 14:28:52 UTC, Gerald Jansen wrote:
>> On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
>>> On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
>>>> On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
>>>>> On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole 
>>>>> wrote:
>>>>>> On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
>>>>>>> At the risk of great embarassment ... here's my program:
>>>>>>> http://dekoppel.eu/tmp/pedupg.d
>>>>>>
>>>>>> Would it be possible to give us some example data?
>>>>>> I might give it a go to try rewriting it tomorrow.
>>>>>
>>>>> http://dekoppel.eu/tmp/pedupgLarge.tar.gz (89 Mb)
>>>>>
>>>>> Contains two largish datasets in a directory structure 
>>>>> expected by the program.
>>>>
>>>> I only see 2 traits in that example, so it's hard for anyone 
>>>> to explore your scaling problem, seeing as there are a 
>>>> maximum of 2 tasks.
>>>
>>> Either way, a few small changes were enough to cut the 
>>> runtime by a factor of ~6 in the single-threaded case and 
>>> improve the scaling a bit, although the printing to output 
>>> files still looks like a bit of a bottleneck.
>>>
>>
>>> http://dpaste.dzfl.pl/80cd36fd6796
>>>
>>> The key thing was reducing the number of allocations (more 
>>> std.algorithm.splitter copying to static arrays, less 
>>> std.array.split) and avoiding File.byLine. Other people in 
>>> this thread have mentioned alternatives to it that may be 
>>> faster/have lower memory usage, I just read the whole files 
>>> in to memory and then lazily split them with 
>>> std.algorithm.splitter. I ended up with some blank lines 
>>> coming through, so i added if(line.empty) continue; in a few 
>>> places, you might want to look more carefully at that, it 
>>> could be my mistake.
>>>
>>> The use of std.array.appender for `info` is just good 
>>> practice, but it doesn't make much difference here.
>>
>> Wow, I'm impressed with the effort you guys (John, Rikki, 
>> others) are making to teach me some efficiency tricks. I guess 
>> this is one of the strengths of D: its community. I'm studying 
>> your various contributions closely!
>>
>> The empty line comes from the very last line on the files, 
>> which also end with a newline (as per "normal" practice?).
>
> Yup, that would be it.
>
> I added a bit of buffered writing and it actually seems to 
> scale quite well for me now.
>
> http://dpaste.dzfl.pl/710afe8b6df5

Fixed the file reading spare '\n' problem, added some comments.

http://dpaste.dzfl.pl/114d5a6086b7