For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores is implementation defined. They might be atomic on newer architectures, but older designs implement them as two discrete 64-bit operations.