Wish: Variable Not Used Warning

Fri Jul 11 13:36:20 PDT 2008

On Fri, 11 Jul 2008 17:16:54 +0000, BCS wrote:

> Reply to Markus,
> 
>> For decades, PC processor manufacturers are optimized their processors
>> for software, not in the other way. That is why the processors execute
>> functions so quickly, that is the sole reasons for having caches (the
>> regular locality of software, e.g. the IBM study from 60's).
>> 
>> 
> I hope I'm reading you wrong but if I'm not: The whole point of the talk
> is that CPU's can't get better performance by optimizing them more. If
> the code isn't written well (the code isn't optimized for the CPU)
> performance will not improve... ever.

They will get better, and that is going to affect your software. IMO you 
should not write your software for CPU, instead you need to follow 
certain paradigms.

I explain this lengthly. The current processors are fundamentally based 
on RASP models, which are an example of so called von Neumann 
architecture. This architecture offers, when physically realized, a very 
flexible but dense computing platform, since it is constructed from two 
specialized parts - memory and CPU. The drawback of this architecture is 
so called von Neumann bottleneck, which has been irritating both 
processor and software designers for decades.

---
The processor fabrication technology sets limitations to how fast a 
processor can execute instructions. The early processors fetched the 
instructions always from main memory (causing of course lots of external 
bus activity), and they processed one instruction at time.

Since processor fabrication technology gets better quite slowly, there 
have always been interest to search "alternative" solutions, which could 
give performance benefits on current technology. These improvements have 
been for example;

- Pipelining
- Super-scalar architectures
- OoO execution
- Threaded processors
- Multi-core processors
- etc.

The more switches you can put to silicon, the more you can try to find 
performance benefits from concurrency. Pipelining & OoO have had a major 
impact to compiler technology; in early days, code generation was 
relatively easy, but in modern days to get the best possible performance 
you really need to know the internals of the processors. When writing 
code with C or D, you really have very minimal possibilities to try to 
make your software to utilize pipelines and OoO - if the compiler does 
not do that, your program will not do that.

But at the same time, the processors have been tried to make compiler-
friendly; since high level languages uses lots of certain instructions 
and patterns, the processors try to be good with them. If you take a look 
to the evolution of processors and compare it to the evolution of 
software design, you will see the impacts of changing from BASIC/
assembler programming to the compiled HLLs, changing from procedural 
languages to OO languages, and changing to threaded architectures.

At BASIC/Assembler era, the processor machine language was intended for 
humans; that was the era of CISC-style processors. Compilers does not 
need human-readable machine code, and when the compiled languages were 
taken into use, there were raise of RISC processors. The procedural 
languages used lots of calls - the processors were optimized for calling 
functions quickly. The OO introduced intensive use of referring data via 
pointers (compared to data segments of procedural languages); the 
processors were optimized for accessing memory efficiently via pointers.

How caching relates to this? Complex memory hierarchy (and in fact, the 
pipelines and OoO, too) is not desirable and intentional thing, it is a 
symptom raised from RASP model. It has been introduced only because it 
can give performance benefits to software, and the key word here is 
locality.

Locality - and its natural consequence, distribution - is, in fact, one 
of the keyword of forthcoming processor models. The next major step in 
processor architectures is very likely reconfigurable platforms, and they 
will introduce a whole new set of challenges to compilers and software to 
be fully utilized. Refer to PlayStation Cell compiler to get the idea.

At code level, you really can't design your software to "reconfigurable-
friendly". The best thing is just keep the code clear, and hope that 
compilers can get the idea and make a good results.

At your software architecture level, if you are using threads, try to 
keep everything local. The importance of that thing is just getting 
higher.

>>> Are you suggesting that it's not
>>> something programmers should be aware of?
>> Yes, I am.
>> 
>> 
> How can you say that? Expecting the tool chain to deal with cache
> effects would be like expecting it to convert a bubble sort into qsort.

Does that description above answer to this question? In case it does not, 
I'll explain; in general software, don't mess with the cache. Instead, 
strive to locality and distribution. Use the threading libraries, and 
when possible, try to do the interactions between threads with some 
standard way.

If you're doing lower level code, like threading library or hardware 
driver, you will probably need to know about caching. That is totally 
different story, since especially writing a hardware driver introduces 
much more things to take into account along with caches.