Array reverse

bearophile bearophileHUGS at lycos.com
Tue Nov 27 16:29:59 PST 2012


monarch_dodra:

> Could you maybe also add this in your study?
>
> void reverseArr(T)(T[] arr) {
>     for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
>         swap(*x++, *y--);
> }


This is the full program:

import core.stdc.stdio: printf;
import std.algorithm: reverse, swap;

void reverseArr(T)(T[] arr) {
     for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; ) {
         auto aux = *x;
         *x++ = *y;
         *y-- = aux;
     }
}

void reverseArray(T)(T[] arr) {
     for (auto x = arr.ptr, y = &arr[$ - 1]; x < y; )
         swap(*x++, *y--);
}

void main() {
     auto a = [10, 20, 30, 40];
     foreach(x; a) printf("%d ", x); printf("\n");
     a.reverseArr();
     foreach(x; a) printf("%d ", x); printf("\n");
     a.reverse();
     foreach(x; a) printf("%d ", x); printf("\n");
     a.reverseArray();
     foreach(x; a) printf("%d ", x); printf("\n");
}


And this is the asm of the reverseArray:

reverseArray:
         push EBX
         mov ECX, 0Ch[ESP]
         mov EBX, 0Ch[ESP]
         push ESI
         mov ESI, 0Ch[ESP]
         lea ESI, -4[ESI*4][ECX]
         push EDI
         cmp ECX, ESI
         jae L30
L1A:    mov EAX, EBX
         mov EDI, ESI
         mov EDX, [EAX]
         mov ECX, [EDI]
         add EBX, 4
         sub ESI, 4
         mov [EAX], ECX
         cmp EBX, ESI
         mov [EDI], EDX
         jb  L1A
L30:    pop EDI
         pop ESI
         pop EBX
         ret 8


So the inner loop is 10 asm instructions instead of the 8 asm 
instructions of the loop of reverseArr(), that is 2 more register 
swaps, that are performed very quickly by the hardware register 
renaming in the modern CPUs. So this is a small difference, 
probably acceptable in many situations. While the difference 
between the performance of reverseArray() and 
std.algorithm.reverse is significant.


> Related, the starting condition, as written: "y = &arr[$ - 1]" 
> will choke on empty ranges.
>
> I'd have suggested: "y = arr.ptr + arr.length - 1", but that 
> will also choke if the arr.ptr is null. I'd say you should just 
> add an empty short-circuit test int there.

Right. But my study was mostly about what's inside the loop of 
those functions. What's outside the loop is not influencing 
performance significantly.

Bye,
bearophile


More information about the Digitalmars-d mailing list