D graph library

Wed May 15 10:21:13 PDT 2013

On 05/15/2013 09:29 AM, w0rp wrote:
> I suggest building the graphs as structs with policy based design, possibly
> having separate policies for how the graphs are stored in memory, possibly not.
> So the basic graph structure would be like this.

What advantage does this have over just defining a few different structs, e.g.,

    struct Graph {
        // undirected graph
    }

    struct Digraph {
        // directed graph
    }

... which each define certain expected interfaces such as nodes(), edges(), etc.?

Not objecting, just curious. :-)

> The policies can then offer methods for accessing and mutating the graphs. So
> that would be stuff like bool addNode(Node), bool removeNode(Node), bool
> addEdge(Node, Node), NodeRange!Node neighbors(Node), etc. All of the internals
> of GraphImplementation could then access the very basic things through these
> methods, and then expose a few to the users by wrapping/aliasing the functions.
> Perhaps encapsluation + alias this is the way to go here.

... why returning bool?  Again, not objecting, just curious.

> I think doing things in this way makes it easy to get the ball rolling, but also
> easy to add in optimisations later. ("Aha, this is an undirected graph, I can
> assume this... static if") I think the most important thing is to get the graph
> basics right, then to worry about how storage works, etc. later.

I kind of agree, my inclination is that we need to be able to work out what are
the interfaces we need in order for different algorithms to work, the underlying
data structures can be refined later.

Just as an example, take a look at this piece of code in igraph, which
implements Brandes' algorithm for betweenness centrality:
https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1714

(It's OK, the algorithm is public and any D implementation will look nothing
like this, so you're not really killing your clean-room opportunities by
glancing through this:-)

It's a bit horrible to read, with lots of the inevitable C boilerplate, lots of
macros, lots of custom data-structures etc., but in terms of the graph, what it
really boils down to is just two things, line 1720:
https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1720

... which calls a function returning the number of vertices, and:
https://bazaar.launchpad.net/~igraph/igraph/0.7-main/view/head:/src/centrality.c#L1833

... where they ask for the neighbours of a given node.

Technically it's a little bit more complex than that (e.g. this function
allocates and creates from scratch an adjacency-list representation of the whole
graph, which might conceivably be expensive both memory-wise and time-wise) but
you get the general picture -- there's actually a relatively small number of
things you need to be able to get from a graph in order for different algorithms
to work effectively.

Their _actual_ underlying graph data structure (OK, OK, the room is starting to
look dirty...) is incredibly simple -- "from" and "to" arrays of vertex IDs,
which define the start- and endpoints of edges; and a few extra things built
around that to facilitate calculating other graph quantities.  And their actual
definition of "directed" is literally a bool.

If you think about that in D terms you might start with arrays from and to, and
e.g. edges() would return lockstep(from, to).  I suspect we could also do many
nice things with D based around those fundamental structures.