Array append performance

Don nospam at nospam.com.au
Thu Aug 28 03:29:25 PDT 2008


Walter Bright wrote:
> Lionello Lunesu wrote:
>> The problem is that the timing for the small arrays cannot be trusted 
>> as there's more overhead than actual code being tested. I've changed 
>> your code by doing each test 100_000_000/i times. I've also added 
>> 'cpuid' for synchronization, as Don suggested:
> 
> I suspect that copying the same data over and over again is not 
> representative of actual usage, and so is not giving useful results.
> 
> I also do not understand the usage of cpuid here. I've used rdtsc for 
> years for profiling purposes, and it has given me results that match the 
> clock-on-the-wall results.

Using cpuid only once doesn't work. The rationale ultimately comes from 
here, I think:

http://cs.smu.ca/~jamuir/rdtscpm1.pdf

But you've got me thinking. So, serialisation only happens with cpuid. 
But, WHEN is lack of serialisation actually a problem?

On my Pentium M, rtdsc takes 13 uops, all using execution port p0. This 
means that anything on the other ports could execute after it.
The only instructions with a latency longer than 13 are div, aam, fdiv, 
the transcendental floating point instructions, and the bizarro 
instructions aam, fbld, fbstp, and cpuid. Pretty similar for Core2 and 
AMD64.

So you may be right -- as long as you don't have one of those super-long 
latency instructions just before your rtdsc call, cpuid is probably 
doing more harm than good.
The conventional wisdom could well be wrong.



More information about the Digitalmars-d mailing list