Using SuiteSparse and the standard GAP benchmarks, I've loaded graphs with 6 billion edges into 256GB of RAM, and can BFS that graph in under a second. [2]
I wonder if this has any benefit of row vs column memory access, which I always forget to bother with unless suddenly my performance crawls.
- The article says adjacency matrices are "usually dense" but that's not true at all, most graph are sparse to very sparse. In a social network with billions of people, the average out degree might be 100. The internet is another example of a very sparse graph, billions of nodes but most nodes have at most one or maybe two direct connections.
- Storing a dense matrix means it can only work with very small graphs, a graph with one million nodes would require one-million-squared memory elements, not possible.
- Most of the elements in the matrix would be "zero", but you're still storing them, and when you do matrix multiplication (one step in a BFS across the graph) you're still wasting energy moving, caching, and multiplying/adding mostly zeros. It's very inefficient.
- Minor nit, it says the diagonal is empty because nodes are already connected to themselves, this isn't correct by theory, self edges are definitely a thing. There's a reason the main diagonal is called "the identity".
- Not every graph algebra uses the numeric "zero" to mean zero, for tropical algebras (min/max) the additive identity is positive/negative infinity. Zero is a valid value in those algebras.
I don't mean to diss on the idea, it's a good way to dip a toe into the math and computer science behind algebraic graph theory, but in production or for anything but the smallest (and densest) graphs, a sparse graph algebra library like SuiteSparse would be the most appropriate.
SuiteSparse is used in MATLAB (A .* B calls SuiteSparse), FalkorDB, python-graphblas, OneSparse (postgres library) and many other libraries. The author Tim Davis from TAMU is a leading expert in this field of research.
(I'm a GraphBLAS contributor and author of OneSparse)
But still true that dense growth is not linear but quadratic to the number of nodes.
> In graph theory, an adjacency matrix is a square matrix used to represent a finite (and usually dense) graph.
Many of these issues evaporate on the realization the article sets out to talk talk to the use of adjacency matrices for dense graphs, which it's trying to point out are the ones you'd commonly use an adjacency matrix for, rather than trying to claim all graphs are dense so you should always use an adjacency matrix.
E.g. a dense graph of 1,000,000 nodes would usually be considered "a pretty damn large dense graph" and so on. These are probably good things to have mentioned here though, as pulling in an article about adjacency matrices for conversation without context of knowing why you're using one already can lead to bad conclusions by folks.
munk-a•3h ago