Steven Schveighoffer wrote: > I would guess this has something to do with > the lack of inlining for algorithmic functions. Yeah, this is almost certainly the problem. I rewrote the code using a traditional C style loop, no external functions, and I'm getting roughly equal performance.