Speed of csvReader

Rikki Cattermole via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Jan 21 02:20:12 PST 2016


On 21/01/16 10:39 PM, data pulverizer wrote:
> I have been reading large text files with D's csv file reader and have
> found it slow compared to R's read.table function which is not known to
> be particularly fast. Here I am reading Fannie Mae mortgage acquisition
> data which can be found here
> http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html
> after registering:
>
> D Code:
>
> import std.algorithm;
> import std.array;
> import std.file;
> import std.csv;
> import std.stdio;
> import std.typecons;
> import std.datetime;
>
> alias row_type = Tuple!(string, string, string, string, string, string,
> string, string,
>                          string, string, string, string, string, string,
> string, string,
>                          string, string, string, string, string, string);
>
> void main(){
>    StopWatch sw;
>    sw.start();
>    auto buffer = std.file.readText("Acquisition_2009Q2.txt");
>    auto records = csvReader!row_type(buffer, '|').array;
>    sw.stop();
>    double time = sw.peek().msecs;
>    writeln("Time (s): ", time/1000);
> }
>
> Time (s): 13.478
>
> R Code:
>
> system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|",
> colClasses = rep("character", 22)))
>     user  system elapsed
>    7.810   0.067   7.874
>
>
> R takes about half as long to read the file. Both read the data in the
> "equivalent" type format. Am I doing something incorrect here?

Okay without registering not gonna get that data.

So usual things to think about, did you turn on release mode?
What about inlining?

Lastly how about disabling the GC?

import core.memory : GC;
GC.disable();

dmd -release -inline code.d


More information about the Digitalmars-d-learn mailing list