D vs C++
Caligo
iteronvexor at gmail.com
Fri Dec 24 12:56:17 PST 2010
This is the page that would require your attention:
http://unthought.net/c++/c_vs_c++.html
I'm going to ignore the C version because it's ugly and uses a hash. I'm
also going to ignore the fastest C++ version because it uses a digital trie
(it's very fast but extremely memory hungry; the complexity is constant over
the size of the input and linear over the length of the word being searched
for). I just wanted to focus on the language and the std library and not
have to implement a data structure.
Here is the C++ code:
#include <unordered_set>
#include <string>
#include <iostream>
#include <stdio.h>
int main(int argc, char* argv[]){
using namespace std;
char buf[8192];
string word;
unordered_set<string> wordcount;
while( scanf("%s", buf) != EOF ) wordcount.insert(buf);
cout << "Words: " << wordcount.size() << endl;
return 0;
}
For D I pretty much used the example from TDPL. As far as I can tell, the
associate array used is closer to std::map (or maybe std::unordered_map ?)
than std::unordered_set, but I don't know of any other data structures in D
for this (I'm still learning).
Here is the D code:
import std.stdio;
import std.string;
void main(){
size_t[string] dictionary;
foreach(line; stdin.byLine()){
foreach(word; splitter(strip(line))){
if(word in dictionary) continue;
dictionary[word.idup] = 1;
}
}
writeln("Words: ", dictionary.length);
}
Here are the measurements (average of 3 runs):
C++
===
Data size: 990K with 23K unique words
real 0m0.055s
user 0m0.046s
sys 0m0.000
Data size: 9.7M with 23K unique words
real 0m0.492s
user 0m0.470s
sys 0m0.013
Data size: 5.1M with 65K unique words
real 0m0.298s
user 0m0.277s
sys 0m0.013
Data size: 51M with 65K unique words
real 0m2.589s
user 0m2.533s
sys 0m0.070
DMD D 2.051
===
Data size: 990K with 23K unique words
real 0m0.064s
user 0m0.053s
sys 0m0.006
Data size: 9.7M with 23K unique words
real 0m0.513s
user 0m0.487s
sys 0m0.013
Data size: 5.1M with 65K unique words
real 0m0.305s
user 0m0.287s
sys 0m0.007
Data size: 51M with 65K unique words
real 0m2.683s
user 0m2.590s
sys 0m0.103
GDC D 2.051
===
Data size: 990K with 23K unique words
real 0m0.146s
user 0m0.140s
sys 0m0.000
Data size: 9.7M with 23K unique words
Segmentation fault
Data size: 5.1M with 65K unique words
Segmentation fault
Data size: 51M with 65K unique words
Segmentation fault
GDC fails for some reason with large number of unique words and/or large
data. Also, GDC doesn't always give correct results; the word count is
usually off by a few hundred.
D and C++ are very close. Without scanf() C++ is almost twice as slow.
Also, using std::unordered_set, the performance almost doubles.
I'm interested to see a better D version than the one I posted.
P.S.
No flame wars please.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20101224/a5c519bb/attachment.html>
More information about the Digitalmars-d
mailing list