# Parallel Merge Sort

bearophile via Digitalmars-d digitalmars-d at puremagic.com
Tue Mar 3 16:00:51 PST 2015

```Josh wrote:

> How can I make my parallel code more efficient. Currently, I am
> getting destroyed by the serial merge sort.
>
> http://pastebin.com/M0GKfTTX

That serial merge sort I've written is little more than a toy. I
suggest you to compare your parallel sort with a serial sort that
allocates better. Perhaps later I'll add it.

Here I have done a quick translation of some C code from
Wikipedia, this is wasteful for memory (not tested much!):

import std.stdio, std.algorithm;

/// A has the items to sort, array B is a work array.
void mergeSort(T)(T[] A) pure nothrow @safe
out {
assert(A.isSorted);
} body {
static void bottomUpMerge(T)(in T[] A, in size_t iLeft, in
size_t iRight, in size_t iEnd, T[] B)
pure nothrow @safe @nogc {
size_t i0 = iLeft;
size_t i1 = iRight;

// While there are elements in the left or right runs.
for (size_t j = iLeft; j < iEnd; j++) {
// If left run head exists and is <= existing right
if (i0 < iRight && (i1 >= iEnd || A[i0] <= A[i1])) {
B[j] = A[i0];
i0++;
} else {
B[j] = A[i1];
i1++;
}
}
}

immutable n = A.length;
auto B = new T[n];

// Each 1-element run in A is already "sorted".
// Make successively longer sorted runs of length 2, 4, 8,
16... until whole array is sorted.
for (size_t width = 1; width < n; width = 2 * width) {
// Array A is full of runs of length width.
for (size_t i = 0; i < n; i = i + 2 * width) {
// Merge two runs: A[i:i+width-1] and
A[i+width:i+2*width-1] to B[]
// or copy A[i:n-1] to B[] ( if(i+width >= n) ).
bottomUpMerge(A, i, min(i + width, n), min(i + 2 *
width, n), B);
}

// Now work array B is full of runs of length 2*width.
swap(A, B);
}
}

void main() {
auto a = [3,1,2,5,4,8,6,7,2,9,1,4,3];
a.mergeSort;
a.writeln;
}

Bye,
bearophile
```