Speed of csvReader
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jan 21 01:39:30 PST 2016
I have been reading large text files with D's csv file reader and
have found it slow compared to R's read.table function which is
not known to be particularly fast. Here I am reading Fannie Mae
mortgage acquisition data which can be found here
http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html after registering:
D Code:
import std.algorithm;
import std.array;
import std.file;
import std.csv;
import std.stdio;
import std.typecons;
import std.datetime;
alias row_type = Tuple!(string, string, string, string, string,
string, string, string,
string, string, string, string, string,
string, string, string,
string, string, string, string, string,
string);
void main(){
StopWatch sw;
sw.start();
auto buffer = std.file.readText("Acquisition_2009Q2.txt");
auto records = csvReader!row_type(buffer, '|').array;
sw.stop();
double time = sw.peek().msecs;
writeln("Time (s): ", time/1000);
}
Time (s): 13.478
R Code:
system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|",
colClasses = rep("character", 22)))
user system elapsed
7.810 0.067 7.874
R takes about half as long to read the file. Both read the data
in the "equivalent" type format. Am I doing something incorrect
here?
More information about the Digitalmars-d-learn
mailing list