tooling quality and some random rant

Tue Feb 15 07:25:05 PST 2011

bearophile wrote:
> Walter:
> 
>> Huh, I simply could never find a document about how to use those which gave me any comfortable sense that the author knew what he was talking about.<
> 
> http://www.agner.org/optimize/
> 
> ------------------
> 
> Don:
> 
>> A problem with that, is that the prefetching instructions are vendor-specific.<
> 
> Right. Then I suggest some higher-level annotations (pragmas?) that the programmer uses to better state the temporal semantics of memory accesses in a performance-critical part of D code.
> 
> 
>> Also, it's quite difficult to use them correctly. If you put them in the wrong place, or use them too much, they slow your code down.<
> 
> CPU caches have a simple purpose. Light speed is finite (how much distance does light travel in vacuum/doped silicon during a clock cycle of a 5 GHz POWER6 CPU? http://en.wikipedia.org/wiki/POWER6 ), and finding one thing among many things is slower than finding among few ones. So you speed up your memory accesses if you read information from a smaller group of data located closer to you. Most CPUs don't have a little faster memory that you manage yourself (http://en.wikipedia.org/wiki/Scratchpad_RAM ), the CPUs copy data from/to cache levels by themselves, so on such CPUs the illusion of a flat memory is at the hardware level, not just at C language level. Cache manage their memory in few different ways, often bigger CPUs offer ways to alter such ways a little, using special instructions. 

The main difference is how they keep coherence across different core 
caches and in what situations they store back data from the cache to RAM.

I think you may be confusing prefetch instructions with non-temporal stores.

The problem with prefetch instructions, is that they interfere with the 
hardware prefetch mechanism. The hardware prefetch is actually very 
good, and it's only under specific circumstances that a manual prefetch 
can beat it. I think it's unlikely that you can use prefetching 
beneficially, unless you've looked at the generated asm code.

> In some cases in your program you want to read from an array, and store data inside it again and another one too, but you never want to store far away data in the first one. There are few other common patterns of memory usage. In theory a normal language like Fortran is enough to specify what memory you want to read or write and when you want to do it. In practice today compilers are not so good at inferring such semantics, so some high level annotations probably help. In future maybe compilers will get better, so they will ignore those annotations, just like they often ignore "register" annotations. Being system-level programming languages practical things, adding annotations is not bad, even if 5-10 years later those annotations become less useful.

Here you're definitely talking about non-temporal stores.
Yes, there is some chance that an annotation for non-temporal stores 
could be beneficial.