A file reading benchmark
bearophile
bearophileHUGS at lycos.com
Fri Feb 17 17:44:04 PST 2012
A tiny little file lines reading benchmark I've just found on Reddit:
http://www.reddit.com/r/programming/comments/pub98/a_benchmark_for_reading_flat_files_into_memory/
http://steve.80cols.com/reading_flat_files_into_memory_benchmark.html
The Ruby code that generates slowly the test data:
https://raw.github.com/lorca/flat_file_benchmark/master/gen_data.rb
But for my timings I have used only about a 40% of that file, the first 1_965_800 lines, because I have less memory.
My Python-Psyco version runs in 2.46 seconds, the D version in 4.65 seconds (the D version runs in 13.20 seconds if I don't disable the GC).
>From many other benchmarks I've seen that file reading line-by-line is slow in D.
-------------------------
My D code:
import std.stdio, std.string, std.array;
void main(in string[] args) {
Appender!(string[][]) rows;
foreach (line; File(args[1]).byLine())
rows.put(line.idup.split("\t"));
writeln(rows.data[1].join(","));
}
-------------------------
My Python 2.6 code:
from sys import argv
from collections import deque
import gc
import psyco
def main():
gc.disable()
rows = deque()
for line in open(argv[1]):
rows.append(line[:-1].split("\t"))
print ",".join(rows[1])
psyco.full()
main()
-------------------------
The test data generator in Ruby:
user_id=1
for user_id in (1..10000)
payments = (rand * 1000).to_i
for user_payment_id in (1..payments)
payment_id = user_id.to_s + user_payment_id.to_s
payment_amount = "%.2f" % (rand * 30);
is_card_present = "N"
created_at = (rand * 10000000).to_i
if payment_id.to_i % 3 == 0
is_card_present = "Y"
end
puts [user_id, payment_id, payment_amount, is_card_present, created_at].join("\t")
end
end
-------------------------
Bye,
bearophile
More information about the Digitalmars-d
mailing list