D graph library

Wed May 8 10:25:26 PDT 2013

On 05/08/2013 06:12 PM, H. S. Teoh wrote:
> Hmm. Is it necessary to provide a random access ranges? For some graph
> algorithms that's necessary, but for others, it may not be.

Agree that it's debatable.  It's probably desirable but not necessary.

>> Note also that both the above automatically define the count of
>> nodes/edges, but we could of course add courtesy functions to give
>> them directly.
> 
> Something along the lines of std.range's .hasLength, I guess? Makes
> sense.

If nodes() and edges() return a RandomAccessRanges then they will automatically
have the .length property.  So you could add a courtesy function like
nodecount() or edgecount() that in this case would just wrap nodes.length but
for other cases might calculate the node or edge count in a different way.

>> Finally, a reasonable question is whether they should return ranges of
>> actual nodes or edges, or ranges of node/edge IDs.
> 
> I think the distinction is mostly superfluous. One could always wrap IDs
> in a struct

Well, I was considering whether making it a range of IDs might be more flexible
with respect to underlying data structures.  I may be falling into "premature
optimization" here. :-)

> Another thought I have is whether we should require a method that
> enumerates all edges. The number of edges in a graph can grow very
> quickly w.r.t. the number of nodes, and some graphs may be so complex
> that the edges can't all be stored in memory at once, but only computed
> per-node. In such cases, requiring edge enumeration may be
> counterproductive. I'm inclined towards leaving out edge enumeration
> unless some specific algorithms require it (even then, we could just
> make it a requirement for those particular algorithms). I admit my
> knowledge of graph algorithms is rather limited, but AFAIK none of them
> require edge enumeration on the entire graph.

Well, if we accept that we're going to be offering the user a range that lists
all edges, the underlying code might handle that in many different ways -- just
reading from an array, cacheing input from a database, or whatever else is
appropriate.  That shouldn't make it difficult to get the overall number of
edges -- that's always going to be a stored number of one kind or another,
whether it's an array length or a record of the number of fields in a database,
or whatever.

> Does it make sense to have a common interface between, say, directed and
> non-directed graphs? That is, are there algorithms that can operate on
> both, or do most of the interesting algorithms only work on one or the
> other? I'm just wondering if it's worth the trouble of doing
> compile-time (or runtime) checks if we could just have two unrelated
> graph types.

Reasonable question.  As far as I can tell most graph libraries offer a broadly
common interface even though the underlying data structure may vary quite a bit.

The practical implementation of those checks might well be simply reading an
immutable boolean variable in the graph type.

> Also, I'm not sure about the usefulness of isBipartiteGraph, unless it
> entails some specific graph method (say a method that returns nodes
> grouped by which partition it lies in) that takes advantage of this
> fact. OTOH, maybe it does serve the purpose of letting the user promise
> that the input graph will be bipartite, so if it isn't and the algorithm
> blows up, it's not our fault. :-P

Yes, that would be the purpose.  The bipartite graph might also be just a
different data type.

> Personally, I'd shorten it to ngbrs, but most people here would
> disagree. :-P

I don't like those kinds of abbreviations in general, I like English. :-P

> I'm not so sure about providing both in and out neighbours / edges.  In
> some cases this may be very inefficient. I think it may be best simply
> to adopt the convention that a directed graph will always return out
> neighbours, and undirected graphs will always return all neighbours.  We
> can always provide an optional method for inverting a graph where this
> is useful / necessary.

Maybe, but in some cases you want to be able to quickly get the in-degree or
whatever.  Inverting the graph is going to be an expensive method that will
require a lot of calculation (and maybe also a lot more memory, if you wind up
storing both versions).

I'd much rather have the option to get in-links and say, "Hey, in some cases
this may be really expensive" than to have no option at all by design.

I'm getting dragged out of the office so will respond to your other points later
... :-P