D vs C++
bearophile
bearophileHUGS at lycos.com
Fri Dec 24 22:21:11 PST 2010
Caligo:
> I'm going to ignore the C version because it's ugly and uses a hash.
Some of the others too use a hash. You can write nice looking code in C too, but you need more skills :-)
> I'm also going to ignore the fastest C++ version because it uses a digital trie
> (it's very fast but extremely memory hungry; the complexity is constant over
> the size of the input and linear over the length of the word being searched
> for).
The fastest C++ version uses more memory, but sometimes if you need more performance it may become the right choice.
> I just wanted to focus on the language and the std library and not
> have to implement a data structure.
One of the few advantages of D over Python is that in D you are able to implement efficient and custom data structures without leaving the D language itself :-)
> For D I pretty much used the example from TDPL. As far as I can tell, the
> associate array used is closer to std::map (or maybe std::unordered_map ?)
D built-in AAs are a hash map, but they use comparisons to resolve collisions. This makes D AAs strong against malicious attacks. Python dicts are faster but they are a pure hash map.
> than std::unordered_set, but I don't know of any other data structures in D
> for this (I'm still learning).
A unordered_set is not present in stc.collections yet.
> Here are the measurements (average of 3 runs):
Your timings lack information about the CPU, compilation switches used, and C++ compiler version used.
Are those really averages?
> I'm interested to see a better D version than the one I posted.
If you want to use only the built-ins and std lib I think you can't improve your code a lot. To go faster you need to go lower level.
Regarding your code, break and continue statements are not Structured Programming, so it's better to avoid them when possible. I write your code like this:
import std.stdio, std.string;
void main() {
size_t[string] dictionary;
foreach (line; stdin.byLine())
foreach (word; line.strip().splitter())
if (word !in dictionary)
dictionary[word.idup] = 1;
writeln("Words: ", dictionary.length);
}
This Python2 version is as fast as the D-DMD version with a 8.7 MB file that contains about 120_000 words:
from sys import stdin
import psyco
def main():
dictionary = {}
for line in stdin:
for word in line.split():
if word not in dictionary:
dictionary[word] = 1
print "Words:", len(dictionary)
psyco.bind(main)
main()
Bye,
bearophile
More information about the Digitalmars-d
mailing list