Multi-architecture binaries

Tue May 1 11:12:44 PDT 2007

I've thought about this myself, and really like the idea.  In the VM 
discussion Don mentioned benchmarking different codepaths to find which 
one works best on the current CPU, then linking the best one in.  This 
makes a lot of sense to me, since CPUs seem to have different 
performance characteristics, even regardless of instruction set 
differences.

I was once benchmarking an algorithm on my notebook computer with a more 
modern processor, and my desktop computer with an older processor.  The 
algo ran faster on the notebook of course, but branching had an 
especially reduced cost.  That is, branching on the more modern 
processor was less expensive relative to other instructions than it was 
on the previous processor.  This was with the same D binary on both of 
them.

That is the sort of stuff that I think the JITC's want to leverage, but 
I have to wonder if using a strategy like this and covering enough 
permutations of costly algorithms would give exactly the same benifit, 
with a massively reduced startup time for applications.  Of course, it 
would also be nice to be able to turn it off, because it will cost SOME 
startup time as well as executable size, which are not worthwhile costs 
for some apps like simple command line apps that need to be snappy and 
small.  It would rock for games though ;)

I really can't wait to see D's performance some day when/if it gets cool 
tricks like this, low-d vector primitives, array operations, etc.

Jascha Wetzel wrote:
> A thought that came up in the VM discussion...
> 
> Suppose someday we have language support for vector operations. We want
> to ship binaries that support but do not require extensions like SSE. We
> do not want to ship multiple binaries and wrappers that switch between
> them or installers that decide which one to use, because it's more work
> and we'd be shipping a lot of redundant code.
> 
> Ideally we wouldn't have to write additional code either. The compiler
> could emit code for multiple targets on a per-function basis (e.g. with
> the target architecure mangled into the function name). The runtime
> would check at startup, which version will be used and "link" the
> appropriate function.
> Here is a small proof-of-concept implementation of this detection and
> linking mechanism.
> 
> Comments?
> 
> 
> ------------------------------------------------------------------------
> 
> import std.cpuid;
> import std.stdio;
> 
> //-----------------------------------------------------------------------------
> //  This code goes into the runtime library
> 
> const uint  CPU_NO_EXTENSION    = 0,
>             CPU_MMX             = 1,
>             CPU_SSE             = 2,
>             CPU_SSE2            = 4,
>             CPU_SSE3            = 8;
> 
> /******************************************************************************
>     A function pointer with a bitmask for it's required extensions
> ******************************************************************************/
> struct MultiTargetVariant
> {
>     static MultiTargetVariant opCall(uint ext, void* func)
>     {
>         MultiTargetVariant mtv;
>         mtv.ext = ext;
>         mtv.func = func;
>         return mtv;
>     }
> 
>     uint    ext;
>     void*   func;
> }
> 
> /******************************************************************************
>     Chooses the first matching MTV
>     and saves it's FP to the dummy entry in the VTBL
> ******************************************************************************/
> void LinkMultiTarget(ClassInfo ci, void* dummy_ptr, MultiTargetVariant[] multi_target_variants)
> {
>     uint extensions;
>     if ( mmx )  extensions |= CPU_MMX;
>     if ( sse )  extensions |= CPU_SSE;
>     if ( sse2 ) extensions |= CPU_SSE2;
>     if ( sse3 ) extensions |= CPU_SSE3;
> 
>     foreach ( i, inout vp; ci.vtbl )
>     {
>         if ( vp is dummy_ptr )
>         {
>             foreach ( variant; multi_target_variants )
>             {
>                 if ( (variant.ext & extensions) == variant.ext )
>                 {
>                     vp = variant.func;
>                     break;
>                 }
>             }
>             assert(vp !is dummy_ptr);
>             break;
>         }
>     }
> }
> 
> 
> //-----------------------------------------------------------------------------
> //  This is application code
> 
> /******************************************************************************
>     A class with a multi-target function
> ******************************************************************************/
> class MyMultiTargetClass
> {
>     // The following 3 functions could be generated automatically by the compiler
>     // with different targets enabled. For example, when we have language support for
>     // vector operations, the compiler could generate multiple versions for different
>     // SIMD extensions. Then there would be only one extension independent implementation.
> 
>     char[] multi_target_sse2()
>     {
>         return "using SSE2";
>     }
> 
>     char[] multi_target_sse_mmx()
>     {
>         return "using SSE and MMX";
>     }
> 
>     char[] multi_target_noext()
>     {
>         return "using no extension";
>     }
> 
>     // The following code could be generated by the compiler if there are multi-target
>     // functions 
> 
>     char[] multi_target() { return null; }
>     static this()
>     {
>         MultiTargetVariant[] variants = [
>             MultiTargetVariant(CPU_SSE2, &multi_target_sse2),
>             MultiTargetVariant(CPU_SSE|CPU_MMX, &multi_target_sse_mmx),
>             MultiTargetVariant(CPU_NO_EXTENSION, &multi_target_noext)
>         ];
>         LinkMultiTarget(this.classinfo, &multi_target, variants);
>     }
> }
> 
> /******************************************************************************
>     Finally, the usage is completely opaque and there is no runtime overhead
>     besides the detection at startup.
> ******************************************************************************/
> void main()
> {
>     MyMultiTargetClass t = new MyMultiTargetClass;
>     writefln("%s", t.multi_target);
> }