array traversal

Fri Mar 30 09:16:34 PDT 2007

Dan wrote:
> Stewart Gordon Wrote:
> 
>> "BCS" <ao at pathlink.com> wrote in message 
>> news:ce0a3343870c8c939107e7e1a1e at news.digitalmars.com...
>> <snip>
>>> I think that options he is taking about are these
>>> for(T* ptr = &start; ptr !is &stop; ptr++)
>>> {
>>>  T value = *ptr
>>> }
>>>
>>> vs.
>>>
>>> for(int i = 0; i< length; i++)
>>> {
>>>   T value = ptr[i];
>>> }
>>>
>>> only the second ever uses a multiplication
>> And even then not necessarily - some compilers may optimise one form to the 
>> other.
>>
>> Stewart. 
>>
> 
> Hmm... let me think of how that looks in ASM.  The first one is wrong, because you're only ptr++, it should be ptr += OFFSET.  Apart from that, you're right the first code looks mildly better in ASM.
> 
> Looping through by adding to the pointer instead of adding to the array index and performing a 'lea'.

But if T.sizeof == 2, 4, or 8, the multiplication can be done in 
hardware for free. EG if i is stored in esi,

mov eax, [ptr + esi*8];

I'd be surprised if there are any modern compilers that actually perform 
a multiply.

> 
> The good news is that D *should be* optimizing this out, and it ultimately only costs a half cycle which may either align or throw off the u/v pipes - something more important than 1/2 a cycle.
> 
> I wonder if Walter has it aligning the pipes as appropriate?

BTW, the u-v pipe thing is not relevant for recent Pentiums any more. 
Core2 CPUs are more likely to be limited by the decoding stage than by 
the execution units.