I saw in the changelog that std.algorithm.swap now uses memcpy because it's faster than the old way. Why is this? If memcpy is faster, how come DMD doesn't generate the same instructions for normal assignment?