[OT] Use case for a 4-D matrix
John Demme
jdd at cs.columbia.edu
Fri Sep 3 23:03:12 PDT 2010
Tomek Sowiński wrote:
> Dnia 03-09-2010 o 23:37:18 John Demme <jdd at cs.columbia.edu> napisał(a):
>
>> (Yes, all of those exponents are 4, not 2. This is actually a 4
>> dimensional
>> matrix, but for the purpose of most parts of the computation, I can
>> treat it
>> like a typical 2-dim matrix. Not relevant, I suppose, but perhaps
>> interesting.)
>
> Very. What you do with 4-D matrices? Tell us!
>
> Tomek
It's a graph similarity comparison roughly based upon Similarity Flooding[1]
To summarize briefly, this large adjacency matrix assists in mapping nodes
from one graph to their most similar counterparts in another graph. It does
so inexactly in approximately n^5 time and n^4 space rather than exponential
and allows me to tweak my definition of "similar" as much as I like.
As for the graphs, I essentially take two input graphs, represented in
adjacency matrix form (two 2-d matrices of size n^2 each, assuming equal
sized graphs). Then, I compute the Kronecker Tensor Graph Product[2], which
creates a matrix of size n^4. Depending on how you think about it, this
matrix is a simple (although large) 2-D adjacency matrix of the product
graph, and it can be treated as such for many operations. It can also be
inspected in four dimensional space to examine the relationships between
possible node pairs, but I don't do this. Frankly, it hurts my brain a
little bit.
Another interesting note, I originally implemented the algorithm in Haskell,
however the space overhead was far too high, and I wasn't able to
parallelize the bits which should have been easy to parallelize. Elegant as
Haskell may be, it turns out writing efficient Haskell programs is a helluva
black art.
A further aside, I run this comparison for m graphs, so we're talking about
m^2 computations of about n^5 time, n^4 space. I'm hoping D will help me
parallelize many of the matrix computations, and should also make it easier
to distribute the m^2 over many machines. (Up to 72 machines with 16 logical
cores and 24GB each.) I was doing this with the Haskell version, but very
inefficiently and via a number of very hack-y poorly automated hacks. I
need to increase the efficiency and level of automation (via D) so that I
can run this analysis many, many times -- I'm planning on tuning the
similarity flooding parameters using machine learning.
Anyway, it'll end up being a huge number of CPU hours no matter how I slice
it. So far it looks like D will help me cut a bunch of hours off (versus
the Haskell version) without adding too many hours to my work week.
Downloaded DMD a few days ago, already ported most of my code and cleaned it
up a bit. It's nice to be back on D :) (BTW, I only decided to come back
after reading Andrei's new book.)
~John
[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=994702
[2] http://en.wikipedia.org/wiki/Tensor_product_of_graphs
More information about the Digitalmars-d
mailing list