Making byLine faster: we should be able to delegate this

Sun Mar 22 00:43:49 PDT 2015

On 3/22/15 12:03 AM, Andrei Alexandrescu wrote:
> I just took a look at making byLine faster. It took less than one evening:
>
> https://github.com/D-Programming-Language/phobos/pull/3089
>
> I confess I am a bit disappointed with the leadership being unable to
> delegate this task to a trusty lieutenant in the community. There's been
> a bug opened on this for a long time, it gets regularly discussed here
> (with the wrong conclusions ("we must redo D's I/O because FILE* is
> killing it!") about performance bottlenecks drawn from unverified
> assumptions), and the techniques used to get a marked improvement in the
> diff above are trivial fare for any software engineer. The following
> factors each had a significant impact on speed:
>
> * On OSX (which I happened to test with) getdelim() exists but wasn't
> being used. I made the implementation use it.
>
> * There was one call to fwide() per line read. I used simple caching (a
> stream's width cannot be changed once set, making it a perfect candidate
> for caching).
>
> (As an aside there was some unreachable code in ByLineImpl.empty, which
> didn't impact performance but was overdue for removal.)
>
> * For each line read there was a call to malloc() and one to free(). I
> set things up that the buffer used for reading is reused by simply
> making the buffer static.
>
> * assumeSafeAppend() was unnecessarily used once per line read. Its
> removal led to a whopping 35% on top of everything else. I'm not sure
> what it does, but boy it does takes its sweet time. Maybe someone should
> look into it.
>
> Destroy.
>
>
> Andrei

* Avoid most calls to GC.sizeOf.

Andrei