Small part of a program : d and c versions performances diff.

Larry via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Jul 9 06:17:58 PDT 2014


On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
> Larry:
>
>> Now the performance :
>> D : 12 µs
>> C : < 1µs
>>
>> Where does the diff comes from ? Is there a way to optimize 
>> the d version ?
>>
>> Again, I am absolutely new to D and those are my very first 
>> line of code with it.
>
> Your C code is not equivalent to the D code, there are small 
> differences, even the output is different. So I've cleaned up 
> your C and D code:
>
> ------------------------
>
> // C code.
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include <time.h>
> #include <sys/time.h>
> #include "jol.h"
>
> int main() {
>     struct timeval s, e;
>     gettimeofday(&s, NULL);
>
>     int pol = 5;
>     tes(&pol);
>
>     int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
> 985, 3215};
>     int len = 13 - 1;
>     int g = 0;
>
>     for (int x = 36; x >= 0; --x) {
>         for (int y = len; y >= 0; --y) {
>             ++g;
>             arr[y]++;
>         }
>     }
>
>     gettimeofday(&e, NULL);
>     printf("C: %d %lu %d %d %d\n",
>            g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);
>
>     return 0;
> }
>
> ------------------------
>
> D code ("final" functions have not much meaning, but the D 
> compiler is very sloppy and doesn't complain):
>
>
> module jol;
>
> void tes(ref int a) {
>     a = 9;
> }
>
>
> ---------
>
> module maind;
>
> void main() {
>     import std.stdio;
>     import std.datetime;
>     import jol;
>
>     StopWatch sw;
>     sw.start;
>
>     int pol = 5;
>     tes(pol);
>
>     int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
> 985, 3215];
>     int len = 13 - 1;
>     int g = 0;
>
>     for (int x = 36; x >= 0; --x) {
>         // Some code here erased for the test.
>         for (int y = len; y >= 0; --y) {
>             // Some other code here.
>             ++g;
>             arr[y]++;
>         }
>     }
>
>     sw.stop;
>     writefln("D: %d %d %d %d %d",
>              g, sw.peek.nsecs, arr[4], arr[9], pol);
> }
>
> ----------------
>
> That D code is not fully idiomatic, this is closer to idiomatic 
> D code:
>
>
> module jol2;
>
> void test(ref int x) pure nothrow @safe {
>     x = 9;
> }
>
>
>
> module maind;
>
> void main() {
>     import std.stdio, std.datetime;
>     import jol2;
>
>     StopWatch sw;
>     sw.start;
>
>     int pol = 5;
>     test(pol);
>
>     int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
> 985, 3215];
>     uint count = 0;
>
>     foreach_reverse (immutable _; 0 .. 37) {
>         foreach_reverse (ref ai; arr) {
>             count++;
>             ai++;
>         }
>     }
>
>     sw.stop;
>     writefln("D: %d %d %d %d %d",
>              count, sw.peek.nsecs, arr[4], arr[9], pol);
> }
>
> ----------------
>
> In my benchmarks I don't have used the more idiomatic D code, I 
> have used the C-like code. But the run-time is essentially the 
> same.
>
> I compile the C and D code with (on a 32 bit Windows):
>
> gcc -march=native -std=c11 -O2 main.c jol.c -o main
>
> ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
> strip maind.exe
>
> For the D code I've used the latest ldc2 compiler (V. 0.13.0, 
> based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 
> (rubenvb-4.8.0).
>
> ----------------
>
> The C code gives as ouput:
>
> C: 481 0 105 602 9
>
>
> The D code gives as output:
>
> D: 481 6076 105 602 9
>
> ----------------------
>
> If I slow down the CPU at half speed the C code runs in about 
> 0.05 seconds, the D code runs in about 0.07 seconds.
>
> Such run times are too much small to perform a sufficiently 
> meaningful comparison. You need a run-time of about 2 seconds 
> to get meaningful timings.
>
> The difference between 0.05 and 0.07 is caused by initializing 
> the D rutime (like the D GC), it takes about 0.015 seconds on 
> my systems at full speed CPU to initialize the D runtime, and 
> it's a constant time.
>
> Bye,
> bearophile

You are definitely right, I did mess up while translating !

I run the corrected codes (the ones I was meant to provide :S) 
and on a slow macbook I end up with :
C : 2
D : 15994

Of course when run on very high end machines, this diff is almost 
non existent but we want to run on very low powered hardware.

Ok, even with a longer code, there will always be a launch 
penalty for d. So I cannot use it for very high performance loops.

Shame for us..
:)

Thanks and bye



More information about the Digitalmars-d-learn mailing list