Error running concurrent process and storing results in array

Wed May 6 05:50:23 UTC 2020

06.05.2020 07:52, data pulverizer пишет:
> On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:
>> On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
>>> Yes, that's exactly what I want the actual computation I'm running is 
>>> much more expensive and much larger. It shouldn't matter if I have 
>>> like 100_000_000 threads should it? The threads should just be queued 
>>> until the cpu works on it?
>>
>> It does matter quite a bit. Each thread has its own resources 
>> allocated to it, and some part of the language will need to interact 
>> with *all* threads, e.g. the GC.
>> In general, if you want to parallelize something, you should aim to 
>> have as many threads as you have cores. Having 100M threads will mean 
>> you have to do a lot of context switches. You might want to look up 
>> the difference between tasks and threads.
> 
> Sorry, I meant 10_000 not 100_000_000 I square the number by mistake 
> because I'm calculating a 10_000 x 10_000 matrix it's only 10_000 tasks, 
> so 1 task does 10_000 calculations. The actual bit of code I'm 
> parallelising is here:
> 
> ```
> auto calculateKernelMatrix(T)(AbstractKernel!(T) K, Matrix!(T) data)
> {
>    long n = data.ncol;
>    auto mat = new Matrix!(T)(n, n);
> 
>    foreach(j; taskPool.parallel(iota(n)))
>    {
>      auto arrj = data.refColumnSelect(j).array;
>      for(long i = j; i < n; ++i)
>      {
>        mat[i, j] = K.kernel(data.refColumnSelect(i).array, arrj);
>        mat[j, i] = mat[i, j];
>      }
>    }
>    return mat;
> }
> ```
> 
> At the moment this code is running a little bit faster than threaded 
> simd optimised Julia code, but as I said in an earlier reply to Ali when 
> I look at my system monitor, I can see that all the D threads are active 
> and running at ~ 40% usage, meaning that they are mostly doing nothing. 
> The Julia code runs all threads at 100% and is still a tiny bit slower 
> so my (maybe incorrect?) assumption is that I could get more performance 
> from D. The method `refColumnSelect(j).array` is (trying to) reference a 
> column from a matrix (1D array with computed index referencing) which I 
> select from the matrix using:
> 
> ```
> return new Matrix!(T)(data[startIndex..(startIndex + nrow)], [nrow, 1]);
> ```
> 
> If I use the above code, I am I wrong in assuming that the sliced data 
> (T[]) is referenced rather than copied? That so if I do:
> 
> ```
> auto myData = data[5...10];
> ```
> 
> myData is referencing elements [5..10] of data and not creating a new 
> array with elements data[5..10] copied?

General advice - try to avoid using `array` and `new` in hot code. 
Memory allocating is slow in general, except if you use carefully 
crafted custom memory allocators. And that can easily be the reason of 
40% cpu usage because the cores are waiting for the memory subsystem.