D graph library

Wed May 8 05:34:32 PDT 2013

On 05/06/2013 07:09 PM, H. S. Teoh wrote:
> So what I envision as far as graph libraries in D are concerned is
> something like the generic range API, where there is a fixed convention
> as to what constitutes a graph/digraph/etc., and the library itself is a
> template library that is able to handle any data type that conforms to
> the requirements.

So, let's have a bit of a think about what you really, really want in a graph
interface.

Something along the lines of the following:

	nodes   /* @property that returns a RandomAccessRange to the collection
                   of nodes.  You could also call this "vertices" or
                   "vertexes"; "nodes" is shorter.  Should be O(1). */

        edges   /* Returns a RandomAccessRange to the collection of edges, i.e.
                   a list of node pairs.  Should be O(1). */

Note that these properties should be O(1) to return, but obviously traversing
them is O(N) [O(V)] or O(E) respectively, where N [V] is the number of nodes
[vertices] and E is the number of edges.

Note also that both the above automatically define the count of nodes/edges, but
we could of course add courtesy functions to give them directly.

Finally, a reasonable question is whether they should return ranges of actual
nodes or edges, or ranges of node/edge IDs.

You'd also need to define various characteristics such as isDirected,
isMultiGraph, isMixedGraph, isSimpleGraph, isWeighted, etc.  Ideally most of
these would be compile-time constraints but might also be implemented as runtime
queries (e.g. isBipartiteGraph could be a compile-time constraint but equally
one could at runtime query a graph without this constraint and prove that it
could indeed be divided into two disjoint sets which satisfy the conditions of
bipartitivity [bipartiteness?]).

Then for nodes/vertices you'd want to define a number of properties:

       neighbours   /* OK, OK, neighbors.  Purely in order to save keystroke
                       time for programmers.  Returns a RandomAccessRange to
                       the collection of nodes that the given node links to.
                       For directed graphs you might want to add a template
                       qualifier of "out" (default) or "in" to allow getting
                       neighbours that this node points to, or neighbours that
                       point to this node.  Should be O(1) [but maybe needs to
                       be O(d)?] */

       edges        /* Returns RandomAccessRange of edges that connect to this
                       node.  For a directed graph you might want again a
                       template qualifier of "out" (default), "in" or "all".
                       Should be O(1) [but maybe needs to be O(d)?] */

Again, these should be ideally be O(1) to return but are obviously O(d) to
traverse where d is the degree of the node.  Note that degree (and in/out degree
for directed graphs) can be implicitly calculated from the above, but we could
add courtesy functions to this extent.

It should also be possible to define arbitrary collections of node properties,
such as colour, type, textual labels, numerical properties etc., and to query if
a node has them (preferably at compile-time).

You'd also want to define functions on node [node ID?] pairs, such as a function
that returns the edge [edge ID?] linking two nodes if they _are_ linked, and
null otherwise.

Similarly for edges you might want to define various properties:

      head, tail     /* the origin and destination nodes respectively */

      isDirected     /* is this a directed link? */

      isLoop         /* does this edge connect a node to itself? */

      isWeighted     /* does this edge have an associated weight? */

      weight         /* ... if it does ... */

And again, you'd want to be able to define arbitrary collections of edge
properties besides "weight", and be able to query (preferably at compile-time)
if an edge has them.

OK, those are some VERY provisional thoughts (on a very tentative proposal:-)
But I'd be happy to receive feedback, particularly on some of the choices (e.g.
should we consider it essential for the ranges in all circumstances to be
RandomAccessRanges or will it help to make some of them simpler?  Should the
ranges be of actual node/edge entities or node/edge IDs? etc.)