Speed of csvReader

data pulverizer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 21 01:39:30 PST 2016


I have been reading large text files with D's csv file reader and 
have found it slow compared to R's read.table function which is 
not known to be particularly fast. Here I am reading Fannie Mae 
mortgage acquisition data which can be found here 
http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html after registering:

D Code:

import std.algorithm;
import std.array;
import std.file;
import std.csv;
import std.stdio;
import std.typecons;
import std.datetime;

alias row_type = Tuple!(string, string, string, string, string, 
string, string, string,
                         string, string, string, string, string, 
string, string, string,
                         string, string, string, string, string, 
string);

void main(){
   StopWatch sw;
   sw.start();
   auto buffer = std.file.readText("Acquisition_2009Q2.txt");
   auto records = csvReader!row_type(buffer, '|').array;
   sw.stop();
   double time = sw.peek().msecs;
   writeln("Time (s): ", time/1000);
}

Time (s): 13.478

R Code:

system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|", 
colClasses = rep("character", 22)))
    user  system elapsed
   7.810   0.067   7.874


R takes about half as long to read the file. Both read the data 
in the "equivalent" type format. Am I doing something incorrect 
here?


More information about the Digitalmars-d-learn mailing list