Small part of a program : d and c versions performances diff.

Wed Jul 9 06:35:56 PDT 2014

On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
> On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
>> Larry:
>>
>>> Now the performance :
>>> D : 12 µs
>>> C : < 1µs
>>>
>>> Where does the diff comes from ? Is there a way to optimize 
>>> the d version ?
>>>
>>> Again, I am absolutely new to D and those are my very first 
>>> line of code with it.
>>
>> Your C code is not equivalent to the D code, there are small 
>> differences, even the output is different. So I've cleaned up 
>> your C and D code:
>>
>> ------------------------
>>
>> // C code.
>> #include <stdio.h>
>> #include <string.h>
>> #include <stdlib.h>
>> #include <time.h>
>> #include <sys/time.h>
>> #include "jol.h"
>>
>> int main() {
>>    struct timeval s, e;
>>    gettimeofday(&s, NULL);
>>
>>    int pol = 5;
>>    tes(&pol);
>>
>>    int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
>> 985, 3215};
>>    int len = 13 - 1;
>>    int g = 0;
>>
>>    for (int x = 36; x >= 0; --x) {
>>        for (int y = len; y >= 0; --y) {
>>            ++g;
>>            arr[y]++;
>>        }
>>    }
>>
>>    gettimeofday(&e, NULL);
>>    printf("C: %d %lu %d %d %d\n",
>>           g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);
>>
>>    return 0;
>> }
>>
>> ------------------------
>>
>> D code ("final" functions have not much meaning, but the D 
>> compiler is very sloppy and doesn't complain):
>>
>>
>> module jol;
>>
>> void tes(ref int a) {
>>    a = 9;
>> }
>>
>>
>> ---------
>>
>> module maind;
>>
>> void main() {
>>    import std.stdio;
>>    import std.datetime;
>>    import jol;
>>
>>    StopWatch sw;
>>    sw.start;
>>
>>    int pol = 5;
>>    tes(pol);
>>
>>    int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
>> 985, 3215];
>>    int len = 13 - 1;
>>    int g = 0;
>>
>>    for (int x = 36; x >= 0; --x) {
>>        // Some code here erased for the test.
>>        for (int y = len; y >= 0; --y) {
>>            // Some other code here.
>>            ++g;
>>            arr[y]++;
>>        }
>>    }
>>
>>    sw.stop;
>>    writefln("D: %d %d %d %d %d",
>>             g, sw.peek.nsecs, arr[4], arr[9], pol);
>> }
>>
>> ----------------
>>
>> That D code is not fully idiomatic, this is closer to 
>> idiomatic D code:
>>
>>
>> module jol2;
>>
>> void test(ref int x) pure nothrow @safe {
>>    x = 9;
>> }
>>
>>
>>
>> module maind;
>>
>> void main() {
>>    import std.stdio, std.datetime;
>>    import jol2;
>>
>>    StopWatch sw;
>>    sw.start;
>>
>>    int pol = 5;
>>    test(pol);
>>
>>    int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
>> 985, 3215];
>>    uint count = 0;
>>
>>    foreach_reverse (immutable _; 0 .. 37) {
>>        foreach_reverse (ref ai; arr) {
>>            count++;
>>            ai++;
>>        }
>>    }
>>
>>    sw.stop;
>>    writefln("D: %d %d %d %d %d",
>>             count, sw.peek.nsecs, arr[4], arr[9], pol);
>> }
>>
>> ----------------
>>
>> In my benchmarks I don't have used the more idiomatic D code, 
>> I have used the C-like code. But the run-time is essentially 
>> the same.
>>
>> I compile the C and D code with (on a 32 bit Windows):
>>
>> gcc -march=native -std=c11 -O2 main.c jol.c -o main
>>
>> ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
>> strip maind.exe
>>
>> For the D code I've used the latest ldc2 compiler (V. 0.13.0, 
>> based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 
>> (rubenvb-4.8.0).
>>
>> ----------------
>>
>> The C code gives as ouput:
>>
>> C: 481 0 105 602 9
>>
>>
>> The D code gives as output:
>>
>> D: 481 6076 105 602 9
>>
>> ----------------------
>>
>> If I slow down the CPU at half speed the C code runs in about 
>> 0.05 seconds, the D code runs in about 0.07 seconds.
>>
>> Such run times are too much small to perform a sufficiently 
>> meaningful comparison. You need a run-time of about 2 seconds 
>> to get meaningful timings.
>>
>> The difference between 0.05 and 0.07 is caused by initializing 
>> the D rutime (like the D GC), it takes about 0.015 seconds on 
>> my systems at full speed CPU to initialize the D runtime, and 
>> it's a constant time.
>>
>> Bye,
>> bearophile
>
> You are definitely right, I did mess up while translating !
>
> I run the corrected codes (the ones I was meant to provide :S) 
> and on a slow macbook I end up with :
> C : 2
> D : 15994
>
> Of course when run on very high end machines, this diff is 
> almost non existent but we want to run on very low powered 
> hardware.
>
> Ok, even with a longer code, there will always be a launch 
> penalty for d. So I cannot use it for very high performance 
> loops.
>
> Shame for us..
> :)
>
> Thanks and bye

Could you provide the exact code you are using for that 
benchmark? Once the program has started up you should be able to 
obtain performance parity between C and D. Situations where this 
isn't true are problems we would like to know about.

For the amount of work you are doing in the test program (almost 
nothing), the total runtime is probably dominated by the program 
load time etc. even when using C.