This is the page that would require your attention:  <a href="http://unthought.net/c++/c_vs_c++.html">http://unthought.net/c++/c_vs_c++.html</a><br><br>I'm going to ignore the C version because it's ugly and uses a hash.  I'm also going to ignore the fastest C++ version because it uses a digital trie (it's very fast but extremely memory hungry; the complexity is constant over the size of the input and linear over the length of the word being searched for).  I just wanted to focus on the language and the std library and not have to implement a data structure.<br>

<br>Here is the C++ code:<br><br>#include <unordered_set><br>#include <string><br>#include <iostream><br>#include <stdio.h><br><br>int main(int argc, char* argv[]){       <br>  <br>  using namespace std;<br>

  char buf[8192];<br>  string word;<br>  unordered_set<string> wordcount;<br>  while( scanf("%s", buf) != EOF ) wordcount.insert(buf);<br>  cout << "Words: " << wordcount.size() << endl;<br>

<br>  return 0;<br>}<br><br>For D I pretty much used the example from TDPL.  As far as I can tell, the associate array used is closer to std::map (or maybe std::unordered_map ?) than std::unordered_set, but I don't know of any other data structures in D for this (I'm still learning).<br>

Here is the D code:<br><br>import std.stdio;<br>import std.string;<br><br>void main(){<br><br>  size_t[string] dictionary;<br>  foreach(line; stdin.byLine()){<br>    foreach(word; splitter(strip(line))){<br>      if(word in dictionary) continue;<br>

      dictionary[word.idup] = 1;<br>    }<br>  }<br>  writeln("Words: ", dictionary.length);<br>}<br><br>Here are the measurements (average of 3 runs):<br><br>C++<br>===<br>Data size: 990K with 23K unique words<br>

real    0m0.055s<br>user   0m0.046s<br>sys     0m0.000<br><br>Data size: 9.7M with 23K unique words<br>real    0m0.492s<br>

user   0m0.470s<br>

sys    0m0.013<br><br>Data size: 5.1M with 65K unique words<br>real    0m0.298s<br>


user   0m0.277s<br>


sys    0m0.013<br><br>Data size: 51M with 65K unique words<br>real    0m2.589s<br>


user   0m2.533s<br>


sys    0m0.070<br><br><br>DMD D 2.051<br>

===<br>

Data size: 990K with 23K unique words<br>

real    0m0.064s<br>

user   0m0.053s<br>

sys     0m0.006<br>

<br>

Data size: 9.7M with 23K unique words<br>

real    0m0.513s<br>


user   0m0.487s<br>


sys    0m0.013<br>

<br>

Data size: 5.1M with 65K unique words<br>

real    0m0.305s<br>


user   0m0.287s<br>


sys    0m0.007<br>

<br>

Data size: 51M with 65K unique words<br>

real    0m2.683s<br>


user   0m2.590s<br>


sys    0m0.103<br>

<br><br>GDC D 2.051<br>


===<br>


Data size: 990K with 23K unique words<br>


real    0m0.146s<br>


user   0m0.140s<br>


sys     0m0.000<br>


<br>


Data size: 9.7M with 23K unique words<br>Segmentation fault<br>


<br>


Data size: 5.1M with 65K unique words<br>


Segmentation fault<br><br>


Data size: 51M with 65K unique words<br>


Segmentation fault<br>

<br>

<br>GDC fails for some reason with large number of unique words and/or large data.  Also, GDC doesn't always give correct results; the word count is usually off by a few hundred.  <br>

<br>D and C++ are very close.  Without scanf() C++ is almost twice as slow.  Also, using std::unordered_set, the performance almost doubles.<br><br>I'm interested to see a better D version than the one I posted.<br><br>

P.S.<br>No flame wars please.  <br><br><br><br><br><br>