Multi-architecture binaries

Wed May 2 08:53:18 PDT 2007

Don Clugston wrote:
> Jascha Wetzel wrote:
>> here is a much simpler version that works with templates. what is boils
>> down to is choosing one template instance at startup that will replace a
>> function pointer.
>>
>> now the only compiler support required would be a pragma or similar to
>> select the target architecture.
> 
> A pragma would only be required as a size optimisation. Probably not 
> worth worrying about (We have enough version information already).
> 
>> this could also be used to manage multiple versions of BLADE code.
> 
> It's a nice idea, but I don't know how it could generate the class to 
> put the 'this()' function into (we don't want a memory alloc every time 
> we enter that function!)
> 
> Interestingly DDL could be fantastic for this. At startup, walk through 
> the symbol fixup table, and look for any import symbols marked 
> __cpu_fixup_XXX.
> When you find them, look for an export symbol called __cpu_SSE2_XXX, and 
> patch them into everything in the the fixup list. That way, you even get 
> a direct function call, instead of an indirect one.
> 
> I wonder if it's possible to pop ESP off the stack, and write back into 
> the code that called you, without the operating system triggering a 
> security alert -- in that case, the function you call could be a little 
> thunk, something like:
> 
> asm {
>   naked;
>   mov eax, CPU_TYPE;
>   mov eax, FUNCPOINTERS[eax];
>   mov ecx, [esp-4]; // get the return address
>   mov [ecx-4], eax; // patch the call address, so this thunk never gets 
> called again.
>   jmp [eax];
> }
> 
> But I think a modern OS would go nuts if you try this?
> (It's been a long time since I wrote self modifying code).

That may be the case.  Also if the code is only called once, it would 
cause a huge cache miss that would last for many nano-seconds.

If this is happen a lot the code would keep spiking over over the place 
(for the first few seconds of the app and then when you hit code that 
hasn't been used before).

A better approach would be to figure them out in large batches, perhaps 
per-module level.  That way you get less cache-misses.

Nice idea though.

-Joel