Structure of Arrays vs Array of Structures

Mon May 15 08:12:40 PDT 2017

On Monday, 15 May 2017 at 06:50:04 UTC, Nicholas Wilson wrote:
> On Monday, 15 May 2017 at 06:44:53 UTC, Nordlöw wrote:

>> Have anybody done this already?
>
> Yes, https://maikklein.github.io/post/soa-d/

The code in that article is overly simplified. Concrete use cases 
would require more than just storing one POD type. Looking back 
at Jonathan Blow's example, what if you wanted to store this type 
in a SoA?

>struct Entity {
>    Vector3 position;
>    Quaternion orientation;
>}

This, obviously, gets more involved, as now the SoA code needs to 
flatten that whole type recursively, otherwise it'd just store 
vectors packed together and then quaternions packed together, i.e:

[vx0, vy0, vz0, vx1, vy1, vz1...][qx0, qy0, qz0, qw0, qx1, qy1, 
qz1, qw1...]

whereas it should store

[vx0, vx1...][vy0, vy1...][vz0, vz1...][qx0, qx1...][qy0, 
qy1...][qz0, qz1...][qw0, qw1...]

Then, if you need to occasionally reassemble PODs from SoA, you 
need to deal with member alignment. And then there's a matter of 
providing actual functions dealing with types that are sparsely 
stored.

Also, there is a... debatable assessment of viability there:

>But sometimes you still want to access all components of your 
>data. An example would be a >vector. [...]
>Most operations will use all components anyways like add, 
>subtract, dot, length and many >more. And even if you sometimes 
>end up with
>
>struct Vec3{
>    float x;
>    float y;
>    float z;
>}
>
>Array!Vec3 positions;
>
>positions[].filter!(v => v.x < 10.0f);
>
>and you want to filter all vectors where the x component is less 
>than 10.0f, you will >still only load two additional floats.

"You will still *only* load two additional floats": that is 
incorrect. The CPU (at least, x86/64) won't load "only" two 
floats. It first needs to fetch the whole cache line only to 
touch every third float in it. That's ~66% of access time wasted. 
(Imagine how severe  that is if it actually has to fetch from 
RAM. Scratch that, don't imagine, time it.). And if you not only 
read, but write data, it gets that much more wasteful. That is 
precisely the problem SoA is intended to solve: it would always 
visit 100% of data it needs to fetch. But of course, if you do 
need to reconstruct tightly packed PODs from SoA, you'll get a 
pretty severe penalty. So, once you commit to storing data this 
way, you have to organize the code accordingly, otherwise you're 
more likely to degrade performance rather than gain it.

I do have an ongoing code experiment to see how far D could take 
us with it, but at this point it's rather immature. Perhaps in a 
couple of weeks I'd get it to a state that I could publish, if 
you guys would like to pitch in.