Taming the optimizer

Thu Jun 14 03:39:39 UTC 2018

I'm trying to run benchmarks on my memcpy implementation 
(https://forum.dlang.org/post/trenuawrekkbewjudmsy@forum.dlang.org) using LDC with optimizations enabled (e.g. LDC -O3 memcpyd.d).  In my first implementation, the optimizer stripped out most of the code I was trying to measure.

Using the information at 
https://stackoverflow.com/questions/40122141/preventing-compiler-optimizations-while-benchmarking, I've created this:

void use(void* p)
{
     version(LDC)
     {
         import ldc.llvmasm;
          __asm("", "r", p);
     }
}

void clobber()
{
     version(LDC)
     {
         import ldc.llvmasm;
         __asm("","~{memory}");
     }
}

// `f` is the function I wish to benchmark.  it's an
// implementation of memcpy in D
Duration benchmark(T, alias f)(const T* src, T* dst)
{
     enum iterations = 10_000_000;
     Duration result;
     auto sw = StopWatch(AutoStart.yes);

     sw.reset();
     foreach (_; 0 .. iterations)
     {
         f(src, dst);
         use(dst);
         clobber();
     }
     result = sw.peek();

     return result;
}

This seems to work, but I don't know that I've implemented it 
properly; especially the `use` function.  How would you write 
this to achieve a real-world optimized measurement?  What's the 
equivalent of...

static void escape(void *p) {
   asm volatile("" : : "g"(p) : "memory");
}

... in LDC inline assembly?

Thanks,
Mike