I've been playing around with the 8-9s version posted earlier. The problem seems to lie in ray_sphere. Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed, but Vec v = center - ray.orig; runs in 11.1s. Still investigating why this happens. --downs