> 3. Intrinsic with custom code generation That is a viable approach. I've had good results by using pattern recognition rather than intrinsics, for things such as byte swapping.