What would be the consequence of implementing interfaces as fat pointers ?

Tue Apr 1 16:05:52 PDT 2014

"Manu" <turkeyman at gmail.com> wrote in message 
news:mailman.9.1396345088.19942.digitalmars-d at puremagic.com...
> On 1 April 2014 18:33, dajones <dajones at hotmail.com> wrote:
>
>>
>> "Manu" <turkeyman at gmail.com> wrote in message
>> news:mailman.122.1396231817.25518.digitalmars-d at puremagic.com...
>> > On 30 March 2014 13:39, Walter Bright <newshound2 at digitalmars.com>
>> wrote:
>> >>>
>> >>> Two pointers structs are passed in register, which is fast. If that
>> >>> spill, that
>> >>> spill on stack, which is hot, and prefetcher friendly.
>> >>>
>> >>
>> >> That underestimates how precious register real estate is on the x86.
>> >
>> >
>> > This is only a concern when passing args. x86 has huge internal 
>> > register
>> > files and uses aggressive register renaming,
>>
>> If we could use them that would be great but we cant. We have to 
>> store/load
>> to memory, and that means aprox 3 cycle latency each way. The cpu cant
>> guess
>> that we're only saving it for later, it has to do the memory write, and
>> even
>> with the store to load forwarding mechanism, spilling and reloading is
>> expensive.
>>
> Can you detail this more?

x86 uses something called (IIRC) a "store forwarding buffer". Essentialy it 
keeps track of stores untill they have been completed. Any time you read 
from an address the store forwrding buffer is checked first, then caches and 
main memory. If it cant do that you have to wait for the store to finalize, 
and that can be a lot slower again. If there's no pending store it comes 
from the cache.

either way memory stores/loads generaly have at best a 3 cycle latency.

> Obviously it must perform the store to maintain memory coherency, but I 
> was
> under the impression that the typical implementation would also keep the
> value around in a renamed register, and when it pops up again at a later
> time, it would use the register directly, rather than load from memory.

I've never read of any x86 doing what you describe. But I'm not too well up 
on the latest CPUs.

> The store shouldn't take any significant time since there's no dependency
> on the stored value, it should only take as long as issuing the store
> instruction; latency is irrelevant, since it's never read back, there's
> nothing waiting on it.

True.

> Not sure what you mean by 'each way', since stored values shouldn't be 
> read
> back if gets the value from a stashed register.
>
> I'm not an expert on the topic, but I read about it some years back, and
> haven't given it much thought since.

Check out the agnor fog microarchitechre and instruction timings pdfs. 
That's pretty much the holy scripture when it comes to this stuff.

It may even be that reducing contention on the memroy unit helps, modern x86 
tend to have multiple ALUs but only 1 memory unit. So instructions with 
memory operands cant be done in paralell as often.