Speed of csvReader
Jon D via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Jan 21 14:09:24 PST 2016
On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer
wrote:
> I have been reading large text files with D's csv file reader
> and have found it slow compared to R's read.table function
> which is not known to be particularly fast.
FWIW - I've been implementing a few programs manipulating
delimited files, e.g. tab-delimited. Simpler than CSV files
because there is no escaping inside the data. I've been trying to
do this in relatively straightforward ways, e.g. using byLine
rather than byChunk. (Goal is to explore the power of D standard
libraries).
I've gotten significant speed-ups in a couple different ways:
* DMD libraries 2.068+ - byLine is dramatically faster
* LDC 0.17 (alpha) - Based on DMD 2.068, and faster than the
DMD compiler
* Avoid utf-8 to dchar conversion - This conversion often occurs
silently when working with ranges, but is generally not needed
when manipulating data.
* Avoid unnecessary string copies. e.g. Don't gratuitously
convert char[] to string.
At this point performance of the utilities I've been writing is
quite good. They don't have direct equivalents with other tools
(such as gnu core utils), so a head-to-head is not appropriate,
but generally it seems the tools are quite competitive without
needing to do my own buffer or memory management. And, they are
dramatically faster than the same tools written in perl (which I
was happy with).
--Jon
More information about the Digitalmars-d-learn
mailing list