About structs and performant handling

Sat Mar 9 17:42:07 PST 2013

Am Sat, 09 Mar 2013 15:07:49 -0800
schrieb Ali Çehreli <acehreli at yahoo.com>:

> Apparently I have been ignorant in modern CPU designs 
> because I was surprised to see that pointer dereferencing seemingly had 
> no cost at all. My guess would be that the object is completely inside 
> the processor's cache.

Be aware of several things playing together here: L1 and L2
cache as well as prefetching and order of the data in memory.
If you create a few KiB of data and run it through a test its
all in the L1 cache and blazing fast. If you have a game and
load a matrix struct from somewhere scattered in memory you'll
see the massive access penalty.
The modern prefetchers in CPUs keep track of a N streams of
forward or backward serial memory accesses. So they work
perfectly for iterating an array for example. The work in the
"background" and use free memory bandwidth to load data from
RAM to CPU caches before you actually need it. This hides the
memory delay that has become increasingly larger in the past
years. It is so important that many don't optimize for CPU
cycles anymore but instead for memory access and cache
locality:

* http://en.wikipedia.org/wiki/Judy_array

* http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

Its easy to underestimate the effects until you benchmark with
some several MiB large random memory access patterns and see
how you get close to a 100 times slow down.

-- 
Marco