Speed of csvReader

Thu Jan 21 14:09:24 PST 2016

On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer 
wrote:
> I have been reading large text files with D's csv file reader 
> and have found it slow compared to R's read.table function 
> which is not known to be particularly fast.

FWIW - I've been implementing a few programs manipulating 
delimited files, e.g. tab-delimited. Simpler than CSV files 
because there is no escaping inside the data. I've been trying to 
do this in relatively straightforward ways, e.g. using byLine 
rather than byChunk. (Goal is to explore the power of D standard 
libraries).

I've gotten significant speed-ups in a couple different ways:
* DMD libraries 2.068+  -  byLine is dramatically faster
* LDC 0.17 (alpha)  -  Based on DMD 2.068, and faster than the 
DMD compiler
* Avoid utf-8 to dchar conversion - This conversion often occurs 
silently when working with ranges, but is generally not needed 
when manipulating data.
* Avoid unnecessary string copies. e.g. Don't gratuitously 
convert char[] to string.

At this point performance of the utilities I've been writing is 
quite good. They don't have direct equivalents with other tools 
(such as gnu core utils), so a head-to-head is not appropriate, 
but generally it seems the tools are quite competitive without 
needing to do my own buffer or memory management. And, they are 
dramatically faster than the same tools written in perl (which I 
was happy with).

--Jon