Ok, well I re-wrote the parallelism amap into spawning/joining threads and the results are similar, except notably less system calls (specifically, less futex calls.)