Profiling graph library

Tue Jul 30 16:50:31 PDT 2013

On 07/31/2013 01:02 AM, bearophile wrote:
> There are many ways to design a graph data structure. All of them have
> tradeoffs, they can't be very good for everything.

Sure, it's just that in this case the reference code (igraph) is _much_ more
performant.  I'm keen to try and identify the major issues that are holding the
D code back.

I _could_ just copy the C algorithms from igraph into my own code, but nothing
would be learned from that, and it would seem to defeat the purpose of using a
more elegant language like D.

> For your benchmarks are you using the ldc2 compiler with correct compilation
> switches?

No, gdmd -O -release -inline -g -profile, and then gprof to generate the profile.

> map() should not allocate memory.
> 
> Sometimes lazy code is faster and sometimes it's slower. A good compiler (like
> Haskell GHC) should be able to optimize map well.

That's what I'd assumed, which was why I was so disappointed that in this case
performance was _very_ bad.  I particularly don't understand the major GC hit.

Or rather, I should say, I don't see why that GC hit should be there, unless
it's that there is a fault in the implementation of iota, map and/or chain.

> Also try to use a bitvector from Phobos instead of bool[].

I did consider that, I haven't tried yet.  Obviously it's superior from a
storage point of view, I wasn't sure if it would be worse or equivalent
performance-wise (I seem to recall reading that it's slightly slower than a
bool[]).  I'll try it. :-)

> Sometimes you can keep memory allocation low, performance almost acceptable, and
> code easy to read using a pre-allocated scratch space plus using map() and
> copy() from std.algorithm.

Could you expand a little on that point?

I don't mean to ask you to write my code for me, it's just that (like most
non-computer-science researchers) my knowledge of programming is somewhat
specialized, and there are blank spots on the map.  So when in your reply to
John you say, "Allocate a buffer somewhere, on the stack or heap", I'm still not
sure exactly what you mean or how you'd use it.  It'd really help to see a
concrete example (it doesn't need to be tailored to the use-case here).

In any case, thanks as ever for all the useful remarks and advice. :-)