I need some help benchmarking SoA vs AoS

Marco Leise via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Mar 26 18:04:40 PDT 2016


Am Sat, 26 Mar 2016 17:43:48 +0000
schrieb maik klein <maikklein at googlemail.com>:

> On Saturday, 26 March 2016 at 17:06:39 UTC, ag0aep6g wrote:
> > On 26.03.2016 18:04, ag0aep6g wrote:
> >> https://gist.github.com/aG0aep6G/a1b87df1ac5930870ffe/revisions
> >
> > PS: Those enforces are for a size of 100_000 not 1_000_000, 
> > because I'm impatient.
> 
> Thanks, okay that gives me more more reliable results.
> for 1_000_000
> 
> benchmarking complete access
> AoS: 1 sec, 87 ms, 266 μs, and 4 hnsecs
> SoA: 1 sec, 491 ms, 186 μs, and 6 hnsecs
> benchmarking partial access
> AoS: 7 secs, 167 ms, 635 μs, and 8 hnsecs
> SoA: 1 sec, 20 ms, 573 μs, and 1 hnsec
> 
> This is sort of what I expected. I will do a few more benchmarks 
> now. I probably also randomize the inputs.

That looks more like it. :) There is a few things to keep in
mind. When you use constant data and don't use the result
compilers can:

- Const-fold computations away.
- Specialize functions on compile-time known arguments. That
  works mostly as if the argument was a template argument. A
  new instance of the function is created for each invokation
  with a compile-time known value. (Disabling inlining wont
  prevent this.)
- Call pure functions with the same argument only once in a
  loop of 1_000_000.
- Replace 1_000_000 additions of the number X in a loop with
  the expression 1_000_000*X.

In addition to these real-world optimizations, when you don't
accumulate the result of the function call and print it or
store it in some global variable, the whole computation may be
removed as "no side-effect", as others have pointed out. When
inlining is used the compiler may also see through attempts to
only use a part of the result and remove instructions that
lead to the rest of it. For example when you return a struct
with two fields - a and b - and store the sum of a, but ignore
b, then the compiler may remove computations that are only
needed for b!

Try to generate input from random number generators or
external files. Disable inlining for the benchmarked function
via attribute or pragma(inline, false) or otherwise make sure
that the compiler cannot guess what any of the arguments are
and perform const-folding after inlining. When the result is
returned, make sure you use so much of it, that the
compiler cannot elide instructions after inlining. It is
often enough to just store it in a global variable.

-- 
Marco



More information about the Digitalmars-d-learn mailing list