D graph library -- update

Thu Jul 11 03:25:22 PDT 2013

On 07/11/2013 02:24 AM, Joseph Rushton Wakeling wrote:
> I know, and it's coming. :-)  The main memory-related issues will probably not
> show up in a situation like this where all we're doing is storing the graph
> data, but in the case where algorithms are being performed on the data.

For comparison, I performed the same tests but with a 10_000 node graph.  Here
we see similar memory use, but igraph outperforms dgraph by a factor of nearly
10 even with the insert of nodes one at a time.

Profiling shows that the time difference is accounted for by the sorting
algorithm used in the indexEdges() method:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 93.12    110.01   110.01    20000     0.01     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda4VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  3.88    114.59     4.58    20000     0.00     0.00
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165Z11SortedRange561__T13quickSortImplS5066dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ12schwartzSortMFAmZS6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv222__T11SortedRangeTAmS1986dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv132__T12schwartzSortS606dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv9__lambda2VAyaa5_61203c2062VE3std9algorithm12SwapStrategy0TAmZ11__lambda165
Z11SortedRange11__lambda168TS3std5range14__T3ZipTAmTAmZ3ZipZ13quickSortImplMFS3std5range14__T3ZipTAmTAmZ3ZipZv
  1.81    116.73     2.14 1124043790     0.00     0.00
_D3std5range14__T3ZipTAmTAmZ3Zip7opIndexMFNaNbNfmZS3std8typecons14__T5TupleTmTmZ5Tuple
  0.59    117.42     0.70 203164131     0.00     0.00
_D3std9algorithm43__T6swapAtTS3std5range14__T3ZipTAmTAmZ3ZipZ6swapAtFNaNbNfS3std5range14__T3ZipTAmTAmZ3ZipmmZv
  0.42    117.92     0.50    20000     0.00     0.01
_D6dgraph5graph13__T5GraphVb0Z5Graph10indexEdgesMFZv

By default, schwartzSort uses SwapStrategy.unstable, which means quicksort is
used as the sorting mechanism.  If we replace this with SwapStrategy.stable or
SwapStrategy.semistable, TimSort is used instead, and this dramatically cuts the
running time -- from almost 2 minutes to under 3 seconds (compared to 13 seconds
for igraph with one-by-one edge addition), and under 2 if ldmd2 is used for
compiling.

This comes at a cost of increasing memory usage from about 3.7 MB (almost
identical to igraph) to 5.6.

Probably the best way to secure optimal performance without the memory hit is to
use an insertion sort (or maybe a smoothsort?).  I guess that Timsort would be
best to use in the case of adding multiple edges in one go, unless no edges at
all have been added before, in which case quicksort would probably be optimal;
though quicksort would probably remain best if memory management is the priority.

So, the new D code is still competitive with igraph, but needs some smoothing
around the edges (quite literally!:-).