[Blog post] Why and when you should use SoA

Sat Mar 26 17:42:07 PDT 2016

On Saturday, 26 March 2016 at 23:31:23 UTC, Alex Parrill wrote:
> On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
>> Link to the blog post: https://maikklein.github.io/post/soa-d/
>> Link to the reddit discussion: 
>> https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
>
> I think structs-of-arrays are a lot more situational than you 
> make them out to be.
>
> You say, at the end of your article, that "SoA scales much 
> better because you can partially access your data without 
> needlessly loading unrelevant data into your cache". But most 
> of the time, programs access struct fields close together in 
> time (i.e. accessing one field of a struct usually means that 
> you will access another field shortly). In that case, you've 
> now split your data across multiple cache lines; not good.
>
> Your ENetPeer example works against you here; the the 
> packetThrottle* variables would be split up into different 
> arrays, but they will likely be checked together when 
> throttling packets. Though admittedly, it's easy to fix; put 
> fields likely to be accessed together in their own struct.
>
> The SoA approach also makes random access more inefficient and 
> makes it harder for objects to have identity. Again, your 
> ENetPeer example works against you; it's common for servers to 
> need to send packets to individual clients rather than 
> broadcasting them. With the SoA approach, you end up accessing 
> a tiny part of multiple arrays, and load several cache lines 
> containing data for ENetPeers that you don't care about (i.e. 
> loading irrelevant data).
>
> I think SoA can be faster if you are commonly iterating over a 
> section of a dataset, but I don't think that's a common 
> occurrence. I definitely think it's unwarranted to conclude 
> that SoAs "scale much better" without noting when they scale 
> better, especially without benchmarks.
>
> I will admit, though, that the template for making the 
> struct-of-arrays is a nice demonstration of D's templates.

The next blog post that I am writing will contain a few 
benchmarks for SoA vs AoS.

> But most of the time, programs access struct fields close 
> together in time (i.e. accessing one field of a struct usually 
> means that you will access another field shortly). In that 
> case, you've now split your data across multiple cache lines; 
> not good.

You can still group the data together if you always access it 
together.  What you wrote is actually not true for arrays, at 
least the way you wrote it.

Array!Foo arr

Iterating over 'arr', you will always load the complete Foo 
struct into memory, unless you hide stuff behind pointers.

> The SoA approach also makes random access more inefficient and 
> makes it harder for objects to have identity.

No it actually makes it much better because you only have to load 
the relevant stuff into memory.

But you usually don't look at your objects in isolation.

AoS makes sense if you always care about all fields like for 
example Array!Vector3. You usually access all components of a 
vector.

What you lose is the general feel of oop.

Vector add(Vector a, Vector b);

Array!Vector vectors;

add(vectors[index1], vectors[index2]);

This really just won't work with SoA, especially if you want to 
mutate the data behind with a reference. For this you would just 
use AoS.

Btw I have done a lot of benchmarks and SoA in the worst case was 
always as fast as SoA.

But once you actually only access partial data, SoA can 
potentially be much faster.

This is what I mean with scaling

You start with

struct Test{
   int i;
   int j;
}
Array!Test tests;

and you have absolutely no performance problem for 'tests' 
because it is just so small.

But after a few years Test will have grown much bigger.

struct Test{
   int i;
   int j;
   int[100] junk;
}

If you use SoA you can always add stuff without any performance 
penalty, that is why I said that it "scales" better.

But as I have said in the blog post, you will not always replace 
AoS with SoA, but you should replace AoS with SoA where it makes 
sense.

> I think SoA can be faster if you are commonly iterating over a 
> section of a dataset, but I don't think that's a common 
> occurrence.

This happens in games very often when you use inheritance, your 
objects just will grow really big the more functionality you add.

Like for example you just want to move all objects based on 
velocity, so you just care about Position, Velocity. You don't 
have to load anything else into memory.

An entity component system really is just SoA at its core.