SIMD/intrinsincs questions

Tue Nov 10 11:59:40 PST 2009

Chad J Wrote:

> Walter Bright wrote:
> > 
> > ... To generate the code directly, assuming the existence of SSE,
> > is to mean the code will only run on modern chips. Whether or not this
> > is a problem depends on your application.
> 
> If MMX/SSE/SSE2 optimizations are low-lying fruit, I'd at least like to
> have an -sse (and maybe -sse2, -sse3, and -no-sse) switch for the
> compiler to determine whether the compiler emits those instructions or
> not.
> 
> I'm also wondering if a more ideal approach (and perhaps additional
> option to those above) would be to borrow the best of JIT compilation
> and emit multiple code paths.  Maybe the program would have a bootstrap
> phase when starting up where it would call cpuid, find out what it has
> available, rewrite the main binary to use the optimal paths, then
> execute the main binary.  That way feature detection doesn't happen
> while the program itself is running, and thus doesn't slow down the
> computations as they happen.  Then passing -sse* would cause it to not
> emit the bootstrap, but instead just assume that the instructions will
> be available.

Incidentally, if you use LLVM to compile to their bitcode, you can at runtime do exactly this sort of thing based on the host hardware, selecting opt passes and having it run codegen based on your exact hardware.  As long as using a given intrinsic falls through to the right glue code where it isn't supported, or else you let the compiler deduce where to use the fancier instructions (not as likely to happen), that works out nicely.

-Mike