[OT] Use case for a 4-D matrix

Fri Sep 3 23:03:12 PDT 2010

Tomek Sowiński wrote:

> Dnia 03-09-2010 o 23:37:18 John Demme <jdd at cs.columbia.edu> napisał(a):
> 
>> (Yes, all of those exponents are 4, not 2.  This is actually a 4
>> dimensional
>> matrix, but for the purpose of most parts of the computation, I can
>> treat it
>> like a typical 2-dim matrix.  Not relevant, I suppose, but perhaps
>> interesting.)
> 
> Very. What you do with 4-D matrices? Tell us!
> 
> Tomek

It's a graph similarity comparison roughly based upon Similarity Flooding[1]

To summarize briefly, this large adjacency matrix assists in mapping nodes 
from one graph to their most similar counterparts in another graph.  It does 
so inexactly in approximately n^5 time and n^4 space rather than exponential 
and allows me to tweak my definition of "similar" as much as I like.

As for the graphs, I essentially take two input graphs, represented in 
adjacency matrix form (two 2-d matrices of size n^2 each, assuming equal 
sized graphs).  Then, I compute the Kronecker Tensor Graph Product[2], which 
creates a matrix of size n^4.  Depending on how you think about it, this 
matrix is a simple (although large) 2-D adjacency matrix of the product 
graph, and it can be treated as such for many operations.  It can also be 
inspected in four dimensional space to examine the relationships between 
possible node pairs, but I don't do this.  Frankly, it hurts my brain a 
little bit.

Another interesting note, I originally implemented the algorithm in Haskell, 
however the space overhead was far too high, and I wasn't able to 
parallelize the bits which should have been easy to parallelize.  Elegant as 
Haskell may be, it turns out writing efficient Haskell programs is a helluva 
black art.

A further aside, I run this comparison for m graphs, so we're talking about 
m^2 computations of about n^5 time, n^4 space.  I'm hoping D will help me 
parallelize many of the matrix computations, and should also make it easier 
to distribute the m^2 over many machines. (Up to 72 machines with 16 logical 
cores and 24GB each.)  I was doing this with the Haskell version, but very 
inefficiently and via a number of very hack-y poorly automated hacks.  I 
need to increase the efficiency and level of automation (via D) so that I 
can run this analysis many, many times -- I'm planning on tuning the 
similarity flooding parameters using machine learning.

Anyway, it'll end up being a huge number of CPU hours no matter how I slice 
it.  So far it looks like D will help me cut a bunch of hours off (versus 
the Haskell version) without adding too many hours to my work week.  
Downloaded DMD a few days ago, already ported most of my code and cleaned it 
up a bit.  It's nice to be back on D :)  (BTW, I only decided to come back 
after reading Andrei's new book.)

~John

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=994702
[2] http://en.wikipedia.org/wiki/Tensor_product_of_graphs