Performant method for reading huge text files

Marco Leise Marco.Leise at gmx.de
Thu Feb 6 20:28:59 PST 2014


Am Tue, 04 Feb 2014 00:04:22 +0000
schrieb "Rene Zwanenburg" <renezwanenburg at gmail.com>:

> On Monday, 3 February 2014 at 23:50:54 UTC, bearophile wrote:
> > Rene Zwanenburg:
> >
> >> The problem is speed. I'm using LockingTextReader in 
> >> std.stdio, but it't not nearly fast enough. On my system it 
> >> only reads about 3 MB/s with one core spending all it's time 
> >> in IO calls.
> >
> > Are you reading the text by lines? In Bugzilla there is a 
> > byLineFast:
> > https://d.puremagic.com/issues/show_bug.cgi?id=11810
> >
> > Bye,
> > bearophile
> 
> Nope, I'm feeding it to csvReader which uses an input range of 
> characters. Come to think of it..
> 
> Well this is embarassing, I've been sloppy with my profiling :). 
> It appears the time is actually spent converting strings to 
> doubles, done by csvReader to read a row into my Record struct. 
> No way to speed that up I suppose. Still I find it surprising 
> that parsing doubles is so slow.

Parsing textual representations of numbers is slow. The other
way around is faster. You have to check all kinds of stuff,
like preceding +/-, starts with a dot, are all characters '0'
to '9', is there an exponent? Is it "NaN" or "nan"?
Floating point math is slow, but when you store the
intermediate results while parsing inside an integer, you may
run out of digits if the number string is long. On the other
hand repeated floating point math will introduce some error
as you append digits.

Here is the ~400 lines version in Phobos:
https://github.com/D-Programming-Language/phobos/blob/master/std/conv.d#L2250

-- 
Marco



More information about the Digitalmars-d-learn mailing list