Inherent code performance advantages of D over C?

Sat Dec 7 01:45:37 PST 2013

On 07/12/13 09:14, Walter Bright wrote:
> There are several D projects which show faster runs than C. If your goal is to
> pragmatically write faster D code than in C, you can do it without too much
> effort. If your goal is to find problem(s) with D, you can certainly do that, too.

Well, as the author of a D library which outperforms the C library that inspired 
it (at least within the limits of its much smaller range of functionality; it's 
been a bit neglected of late and needs more input) ...

... the practical experience I've had is that more than an outright performance 
comparison, what it often comes down to is effort vs. results, and the 
cleanliness/maintainability of the resulting code.  This is particularly true 
when it comes to C code that is designed to be "safe", with all the resulting 
boilerplate.  It's typically possible to match or exceed the performance of a C 
program with much more concise and easy to follow D code.

Another factor that's important here is that C and D in general seem to lead to 
different design solutions.  Even if one has an exact example in C to compare 
to, the natural thing to do in D is often something different, and that leads to 
subtle and not-so-subtle implementation differences that in turn affect performance.

Example: in the C library that was my inspiration, there's a function which 
requires the user to pass a buffer, to which it writes a certain set of values 
which are calculated from the underlying data.  I didn't much like the idea of 
compelling the user to pass a buffer, so when I wrote my D equivalent I used 
stuff from std.range and std.algorithm to make the function return a 
lazily-evaluated range that would offer the same values as the C code stored in 
the buffer array.

I assumed this might lead to a small overall performance hit because the C 
program could just write once to a buffer and re-use the buffer, whereas I might 
be lazily calculating and re-calculating.  Unfortunately it turned out that for 
whatever reason, my lazily-calculated range was somehow responsible for lots of 
micro allocations, which slowed things down a lot.  (I tried it out again 
earlier this morning, just to refresh my memory, and it looks like this may no 
longer be the case; so perhaps something has been fixed here...)

So, that in turn led me to another solution again, where instead of an external 
buffer being passed in, I created an internal cache which could be written to 
once and re-used again and again and again, never needing to recalculate unless 
the internal data was changed.

Now, _that_ turned out to be significantly faster than the C program, which was 
almost certainly doing unnecessary recalculation of the buffer -- because it 
recalculated every time the function was called, whereas my program could rely 
on the cache, calculate once, and after that just return the slice of calculated 
values.  On the other hand, if I tweaked the internals of the function so that 
every call _always_ involved recalculating and rewriting to the cache, it was 
slightly slower than the C -- probably because now it was the C code that was 
doing less recalculation, because code that was calling the function was calling 
it once and then using the buffer, rather than calling it multiple times.

TL;DR the point is that writing in D gave me the opportunity to spend mental and 
programming time exploring these different choices and focusing on algorithms 
and data structures, rather than all the effort and extra LOC required to get a 
_particular_ idea running in C.  That's where the real edge arises.