Making byLine faster: we should be able to delegate this

John Colvin via Digitalmars-d digitalmars-d at puremagic.com
Mon Mar 23 08:00:05 PDT 2015


On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
wrote:
> I just took a look at making byLine faster. It took less than 
> one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being 
> unable to delegate this task to a trusty lieutenant in the 
> community. There's been a bug opened on this for a long time, 
> it gets regularly discussed here (with the wrong conclusions 
> ("we must redo D's I/O because FILE* is killing it!") about 
> performance bottlenecks drawn from unverified assumptions), and 
> the techniques used to get a marked improvement in the diff 
> above are trivial fare for any software engineer. The following 
> factors each had a significant impact on speed:
>
> * On OSX (which I happened to test with) getdelim() exists but 
> wasn't being used. I made the implementation use it.
>
> * There was one call to fwide() per line read. I used simple 
> caching (a stream's width cannot be changed once set, making it 
> a perfect candidate for caching).
>
> (As an aside there was some unreachable code in 
> ByLineImpl.empty, which didn't impact performance but was 
> overdue for removal.)
>
> * For each line read there was a call to malloc() and one to 
> free(). I set things up that the buffer used for reading is 
> reused by simply making the buffer static.
>
> * assumeSafeAppend() was unnecessarily used once per line read. 
> Its removal led to a whopping 35% on top of everything else. 
> I'm not sure what it does, but boy it does takes its sweet 
> time. Maybe someone should look into it.
>
> Destroy.
>
>
> Andrei

What would be really great would be a performance test suite for 
phobos. D is reaching a point where "It'll probably be fast 
because we did it right" or "I remember it being fast-ish 3 years 
ago when i wrote a small toy test" isn't going to cut it. Real 
data is needed, with comparisons to other languages where 
possible.


More information about the Digitalmars-d mailing list