I had the same situation, and ended up with the malloc/free option. It's also often possible to get rid of allocations in a loop by pre-allocating thread-local buffers and reusing them throughout (see std.parallelism.TaskPool.workerLocalStorage).