Benchmark of D against other languages
Laeeth Isharc via Digitalmars-d
digitalmars-d at puremagic.com
Thu Apr 2 19:20:23 PDT 2015
On Tuesday, 31 March 2015 at 18:20:05 UTC, cym13 wrote:
> I found this repository (reddit!) that hosts common benchmarks
> for many languages such as D, Nim, Go, python, C, etc... It
> uses only standard structures not to influence the benchmark.
>
> https://github.com/kostya/benchmarks
Thanks for this.
BTW, some of these benchmarks were taken from attractivechaos
here. He seems to be working in bioinformatics and does not
consider himself a programmer by profession. But has written
some libraries others use, and is clearly bright and thoughtful.
He is pro-D and sad people have not recognized its qualities.
https://attractivechaos.wordpress.com/
From 2010:
"Although I do not use D, I always see it as one of the most
attractive programming languages, smartly balancing efficiency,
simplicity and extensibility. At the same time, I keep getting
frustrated when I see such an elegant thing fade away gradually
given that a) D has dropped out of top 20 in TIOBE Programming
Community Index and b) it was not evaluated by the Computer
Language Benchmarks Game any more. Most programmers know why this
happens. I am simply frustrated.
D is falling while Go is rising. I do appreciate the philosophy
behind the design of Go and trust Rob Pike and Ken Thompson to
deliver another great product, but right now I do not see Go as a
replacement of any mainstream programming languages as long as it
is way slower than Java, not to speak C/C++. To me, Go’s rising
is merely due to the support from Google. It is good as a
research project, but it needs time to reach the critical mass in
practice.
While reading the Computer Language Benchmarks Game, I am really
amazed by LuaJIT. Probably I am going to try it some day."
Some of his posts on efficiency of data structures are worth
reading, particularly on hash tables, btrees/redblack
trees/sorting (might be old to you, but there was some new stuff
for me):
https://attractivechaos.wordpress.com/tag/benchmark/
Speed and memory. The larger the hash table, the fewer collisions
may occur and the faster the speed. For the same hash library,
increasing memory always increases speed. When we compare two
libraries, both speed and memory should be considered.
C vs. C++. All C++ implementations have similar API. It is also
very easy to use for any type of keys. Both C libraries, ghthash
and glib, can only keep pointers to the keys, which complicates
API and increases memory especially for 64-bit systems where a
pointer takes 8 bytes. In general, C++ libraries is perferred
over C ones. Surprisingly, on 32-bit Mac OS X, glib outperforms
TR1 and STL for string input. This might indicate that the glib
implementation itself is very efficient, but just the lack of
functionality in C affects the performance.
Generic programming in C. Except my khash.h, all the other C hash
libraries use (void*) to achieve generic typing. Using void* is
okey for strings, but will cause overhead for integers. This is
why all C libraries, except khash.h, is slower than C++ libraries
on integer keys, but close to on string keys.
Open addressing vs. chaining hash. Khash and google hash
implement open addressing hash while the remaining implement
chaining hash. In open addressing hash, the size of each bucket
equals the size of a key plus 0.25 byte. Google sparsehash
further compresses unused bucket to 1 bit, achieving high memory
efficiency. In chaining hash, the memory overhead of each bucket
is at least 4 bytes on 32bit machines, or 8 bytes on 64bit
machines. However, chaining hash is less affected when the hash
table is nearly full. In practice, both open addressing and
chaining hash occupy similar memory under similar speed. Khash
takes less peak memory mainly due to its advanced technique in
rehashing which reduces memory usage. So far as speed is
concerned, chaining hash may have fewer comparison between keys.
We can see this from the fact that the speed of chaining hash
approaches that of open addressing hash on string keys but much
slower on integer keys.
Memory usage of search trees. B-tree is the winner here. Each
element in the B-tree only needs one additional pointer. When
there are enough elements, a B-tree is at least halfly full; on
average it should be around 75% full. And so on 64-bit systems,
for a B-tree with N elements, we need additional N*8/0.75=10N
bytes memory. Splay tree will need N*8*2=16N extra space. RB tree
is the worst.
Other issues. a) Google hash becomes unbearably slow when I try
to put a lot of strings in the hash table. All the other
libraries do not have this problem. b) Google hash performs more
comparisons than khash. This is obvious because google-dense is
clearly faster on integer keys but comparable to khash on string
keys.
Concluding remarks
C++ hash library is much easier to use than C libraries. This is
definitely where C++ is preferred over C.
TR1 hash implementation is no faster than STL implementation.
They may outperform one another under certain input or settings.
SGI hash_map is faster and takes less memory than STL map. Unless
ordering is important, hash_map is a better container than map.
Google hash is a worthy choice when we understand why it is slow
for many string keys.
My khash library, which is a single-file C++ template header,
achieves good balance between speed and memory. All my source
codes are available at the Programs page.
Update
C interface can be elegant, too, if we implement it cleverly. See
this post.
I realize that we just need one lookup to achieve “insert if
absent; delete otherwise”. This further improves the speed for
all C++ libraries.
I have analyzed google dense hash table in this post which
explains why it is faster than khash on integer keys but close to
or slower than on string keys.
More information about the Digitalmars-d
mailing list