problem with parallel foreach

Wed May 13 07:43:49 PDT 2015

On Wednesday, 13 May 2015 at 14:28:52 UTC, Gerald Jansen wrote:
> On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
>> On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
>>> On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
>>>> On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole 
>>>> wrote:
>>>>> On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
>>>>>> At the risk of great embarassment ... here's my program:
>>>>>> http://dekoppel.eu/tmp/pedupg.d
>>>>>
>>>>> Would it be possible to give us some example data?
>>>>> I might give it a go to try rewriting it tomorrow.
>>>>
>>>> http://dekoppel.eu/tmp/pedupgLarge.tar.gz (89 Mb)
>>>>
>>>> Contains two largish datasets in a directory structure 
>>>> expected by the program.
>>>
>>> I only see 2 traits in that example, so it's hard for anyone 
>>> to explore your scaling problem, seeing as there are a 
>>> maximum of 2 tasks.
>>
>> Either way, a few small changes were enough to cut the runtime 
>> by a factor of ~6 in the single-threaded case and improve the 
>> scaling a bit, although the printing to output files still 
>> looks like a bit of a bottleneck.
>>
>
>> http://dpaste.dzfl.pl/80cd36fd6796
>>
>> The key thing was reducing the number of allocations (more 
>> std.algorithm.splitter copying to static arrays, less 
>> std.array.split) and avoiding File.byLine. Other people in 
>> this thread have mentioned alternatives to it that may be 
>> faster/have lower memory usage, I just read the whole files in 
>> to memory and then lazily split them with 
>> std.algorithm.splitter. I ended up with some blank lines 
>> coming through, so i added if(line.empty) continue; in a few 
>> places, you might want to look more carefully at that, it 
>> could be my mistake.
>>
>> The use of std.array.appender for `info` is just good 
>> practice, but it doesn't make much difference here.
>
> Wow, I'm impressed with the effort you guys (John, Rikki, 
> others) are making to teach me some efficiency tricks. I guess 
> this is one of the strengths of D: its community. I'm studying 
> your various contributions closely!
>
> The empty line comes from the very last line on the files, 
> which also end with a newline (as per "normal" practice?).

Yup, that would be it.

I added a bit of buffered writing and it actually seems to scale 
quite well for me now.

http://dpaste.dzfl.pl/710afe8b6df5