Benchmark of D against other languages

Thu Apr 2 19:20:23 PDT 2015

On Tuesday, 31 March 2015 at 18:20:05 UTC, cym13 wrote:
> I found this repository (reddit!) that hosts common benchmarks 
> for many languages such as D, Nim, Go, python, C, etc... It 
> uses only standard structures not to influence the benchmark.
>
> https://github.com/kostya/benchmarks

Thanks for this.

BTW, some of these benchmarks were taken from attractivechaos 
here.  He seems to be working in bioinformatics and does not 
consider himself a programmer by profession.  But has written 
some libraries others use, and is clearly bright and thoughtful.  
He is pro-D and sad people have not recognized its qualities.

https://attractivechaos.wordpress.com/

 From 2010:
"Although I do not use D, I always see it as one of the most 
attractive programming languages, smartly balancing efficiency, 
simplicity and extensibility. At the same time, I keep getting 
frustrated when I see such an elegant thing fade away gradually 
given that a) D has dropped out of top 20 in TIOBE Programming 
Community Index and b) it was not evaluated by the Computer 
Language Benchmarks Game any more. Most programmers know why this 
happens. I am simply frustrated.

D is falling while Go is rising. I do appreciate the philosophy 
behind the design of Go and trust Rob Pike and Ken Thompson to 
deliver another great product, but right now I do not see Go as a 
replacement of any mainstream programming languages as long as it 
is way slower than Java, not to speak C/C++. To me, Go’s rising 
is merely due to the support from Google. It is good as a 
research project, but it needs time to reach the critical mass in 
practice.

While reading the Computer Language Benchmarks Game, I am really 
amazed by LuaJIT. Probably I am going to try it some day."

Some of his posts on efficiency of data structures are worth 
reading, particularly on hash tables, btrees/redblack 
trees/sorting (might be old to you, but there was some new stuff 
for me):

https://attractivechaos.wordpress.com/tag/benchmark/

Speed and memory. The larger the hash table, the fewer collisions 
may occur and the faster the speed. For the same hash library, 
increasing memory always increases speed. When we compare two 
libraries, both speed and memory should be considered.

C vs. C++. All C++ implementations have similar API. It is also 
very easy to use for any type of keys. Both C libraries, ghthash 
and glib, can only keep pointers to the keys, which complicates 
API and increases memory especially for 64-bit systems where a 
pointer takes 8 bytes. In general, C++ libraries is perferred 
over C ones. Surprisingly, on 32-bit Mac OS X, glib outperforms 
TR1 and STL for string input. This might indicate that the glib 
implementation itself is very efficient, but just the lack of 
functionality in C affects the performance.
Generic programming in C. Except my khash.h, all the other C hash 
libraries use (void*) to achieve generic typing. Using void* is 
okey for strings, but will cause overhead for integers. This is 
why all C libraries, except khash.h, is slower than C++ libraries 
on integer keys, but close to on string keys.
Open addressing vs. chaining hash. Khash and google hash 
implement open addressing hash while the remaining implement 
chaining hash. In open addressing hash, the size of each bucket 
equals the size of a key plus 0.25 byte. Google sparsehash 
further compresses unused bucket to 1 bit, achieving high memory 
efficiency. In chaining hash, the memory overhead of each bucket 
is at least 4 bytes on 32bit machines, or 8 bytes on 64bit 
machines. However, chaining hash is less affected when the hash 
table is nearly full. In practice, both open addressing and 
chaining hash occupy similar memory under similar speed. Khash 
takes less peak memory mainly due to its advanced technique in 
rehashing which reduces memory usage. So far as speed is 
concerned, chaining hash may have fewer comparison between keys. 
We can see this from the fact that the speed of chaining hash 
approaches that of open addressing hash on string keys but much 
slower on integer keys.

Memory usage of search trees. B-tree is the winner here. Each 
element in the B-tree only needs one additional pointer. When 
there are enough elements, a B-tree is at least halfly full; on 
average it should be around 75% full. And so on 64-bit systems, 
for a B-tree with N elements, we need additional N*8/0.75=10N 
bytes memory. Splay tree will need N*8*2=16N extra space. RB tree 
is the worst.
Other issues. a) Google hash becomes unbearably slow when I try 
to put a lot of strings in the hash table. All the other 
libraries do not have this problem. b) Google hash performs more 
comparisons than khash. This is obvious because google-dense is 
clearly faster on integer keys but comparable to khash on string 
keys.

Concluding remarks

C++ hash library is much easier to use than C libraries. This is 
definitely where C++ is preferred over C.
TR1 hash implementation is no faster than STL implementation. 
They may outperform one another under certain input or settings.
SGI hash_map is faster and takes less memory than STL map. Unless 
ordering is important, hash_map is a better container than map.
Google hash is a worthy choice when we understand why it is slow 
for many string keys.
My khash library, which is a single-file C++ template header, 
achieves good balance between speed and memory. All my source 
codes are available at the Programs page.
Update

C interface can be elegant, too, if we implement it cleverly. See 
this post.
I realize that we just need one lookup to achieve “insert if 
absent; delete otherwise”. This further improves the speed for 
all C++ libraries.
I have analyzed google dense hash table in this post which 
explains why it is faster than khash on integer keys but close to 
or slower than on string keys.