D vs C++

Fri Dec 24 21:51:52 PST 2010

If there are, say, 14 unique words then the executable compiled with GDC
doesn't always output the correct result and sometimes it gives segmentation
fault. 14 in this case would be the correct result, and 32 would not.  It
seems to work fine with very small data sets, but things start to go wrong
with larger ones.

As for the system, it's a 64-bit GNU/Linux, no multilib.  What else do you
need?

For GDC I've used gcc-4.4.5 and the following compiler flags:
'gdc -O2 -o count_d count.d'

I can't post the data because it's too large, but it shouldn't be too
difficult to generate it. 1MB of text file should work.

On Fri, Dec 24, 2010 at 6:49 PM, Iain Buclaw <ibuclaw at ubuntu.com> wrote:

> == Quote from Caligo (iteronvexor at gmail.com)'s article
> > --000e0cd215b8b968a004982e3775
> > Content-Type: text/plain; charset=ISO-8859-1
> > This is the page that would require your attention:
> > http://unthought.net/c++/c_vs_c++.html
> > I'm going to ignore the C version because it's ugly and uses a hash.  I'm
> > also going to ignore the fastest C++ version because it uses a digital
> trie
> > (it's very fast but extremely memory hungry; the complexity is constant
> over
> > the size of the input and linear over the length of the word being
> searched
> > for).  I just wanted to focus on the language and the std library and not
> > have to implement a data structure.
> > Here is the C++ code:
> > #include <unordered_set>
> > #include <string>
> > #include <iostream>
> > #include <stdio.h>
> > int main(int argc, char* argv[]){
> >   using namespace std;
> >   char buf[8192];
> >   string word;
> >   unordered_set<string> wordcount;
> >   while( scanf("%s", buf) != EOF ) wordcount.insert(buf);
> >   cout << "Words: " << wordcount.size() << endl;
> >   return 0;
> > }
> > For D I pretty much used the example from TDPL.  As far as I can tell,
> the
> > associate array used is closer to std::map (or maybe std::unordered_map
> ?)
> > than std::unordered_set, but I don't know of any other data structures in
> D
> > for this (I'm still learning).
> > Here is the D code:
> > import std.stdio;
> > import std.string;
> > void main(){
> >   size_t[string] dictionary;
> >   foreach(line; stdin.byLine()){
> >     foreach(word; splitter(strip(line))){
> >       if(word in dictionary) continue;
> >       dictionary[word.idup] = 1;
> >     }
> >   }
> >   writeln("Words: ", dictionary.length);
> > }
> > Here are the measurements (average of 3 runs):
> > C++
> > ===
> > Data size: 990K with 23K unique words
> > real    0m0.055s
> > user   0m0.046s
> > sys     0m0.000
> > Data size: 9.7M with 23K unique words
> > real    0m0.492s
> > user   0m0.470s
> > sys    0m0.013
> > Data size: 5.1M with 65K unique words
> > real    0m0.298s
> > user   0m0.277s
> > sys    0m0.013
> > Data size: 51M with 65K unique words
> > real    0m2.589s
> > user   0m2.533s
> > sys    0m0.070
> > DMD D 2.051
> > ===
> > Data size: 990K with 23K unique words
> > real    0m0.064s
> > user   0m0.053s
> > sys     0m0.006
> > Data size: 9.7M with 23K unique words
> > real    0m0.513s
> > user   0m0.487s
> > sys    0m0.013
> > Data size: 5.1M with 65K unique words
> > real    0m0.305s
> > user   0m0.287s
> > sys    0m0.007
> > Data size: 51M with 65K unique words
> > real    0m2.683s
> > user   0m2.590s
> > sys    0m0.103
> > GDC D 2.051
> > ===
> > Data size: 990K with 23K unique words
> > real    0m0.146s
> > user   0m0.140s
> > sys     0m0.000
> > Data size: 9.7M with 23K unique words
> > Segmentation fault
> > Data size: 5.1M with 65K unique words
> > Segmentation fault
> > Data size: 51M with 65K unique words
> > Segmentation fault
> > GDC fails for some reason with large number of unique words and/or large
> > data.  Also, GDC doesn't always give correct results; the word count is
> > usually off by a few hundred.
> > D and C++ are very close.  Without scanf() C++ is almost twice as slow.
> > Also, using std::unordered_set, the performance almost doubles.
> > I'm interested to see a better D version than the one I posted.
> > P.S.
> > No flame wars please.
>
> System details, compiler flags and the test data you used would be helpful.
> Else
> can't be sure what you mean by "doesn't always give correct results". :~)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20101224/3b61741f/attachment-0001.html>