In this post, we lay out the basics of spectral graph theory - the study of graph structure through the lens of linear algebra. Some of this material is based on a lecture for 6.854 given by Professor Aleksander Madry, and these notes by Professor Dan Spielman.

1. Motivation

There are many directions from which you can be naturally led to spectral graph theory - pseudorandomness, Markov chain Monte Carlo methods, modern graph algorithms, all your friends talking about it,....

Here's a simple concrete example. Suppose you have an undirected loopless graph $G=(V,E)$, and want to think about the random walk on $G$ for some reason. Maybe you want to compute pageranks, or maybe you have been partying too much and want to find out how long it'll take you to get back home if you just take random intersections in your town (of course, like always, you still have your mathematical wits about you). As we will see, this will naturally lead you to caring about the eigenvalues of the adjacency matrix of $G$.

We'll make a simplifying assumption first: suppose $G$ is $d$-regular. That's not crucial, but makes the motivation clearer.

If you start at some $v\in V$, the probability of ending up at some $u\in V$ in $t$ steps is precisely the fraction of all length-$t$ paths in $G$ starting at $s$ which also end up in $u$. But what's that?

Let's pass to a simpler question, when $t=1$. Then the next step takes you to any neighbor of $v$ with probability $1/\deg v=1/d$. Then at the step after that, there's some more complications because there might be multiple ways to get to a vertex. However, what remains true is that each vertex spreads its probability uniformly to its neighbors. Some poking around shows that if we use the adjacency matrix $A\in\mathbb{R}^{V\times V}$ of $G$ given by \begin{align*} A_{u,v} = \text{number of edges between $u$ and $v$ in $G$} \end{align*} then if the probability distribution on the vertices was $p$, it will be $(A/d)p$ after the next step. So, the probability distribution after $t$ steps will be \begin{align*} (A/d)^te_v \end{align*} where $e_v$ is the indicator of $v$. The key point now is that understanding the power $A^t$ is messy in the standard basis, but it's easy in the basis of eigenvectors. Letting $W=A/d$ be the random walk matrix, if we write the orthonormal eigendecomposition \begin{align*} W = \sum_{i=1}^{n}\lambda_i v_iv_i^T \quad \text{for $\lambda_i\in\mathbb{R}$ and orthonormal $v_i$} \end{align*} (which exists because $A$ is symmetric) then quite easily \begin{align*} W^t = \sum_{i=1}^{n} \lambda_i^t v_iv_i^T \end{align*} Many useful handles on the behavior of random walks and the number of walks with specific properties follow from that. For example, to address the two problems listed in the beginning, we have (try proving those):

• The second-largest in absolute value eigenvalue of $G$ controls the rate of convergence of the random walk to the uniform distribution
• The probability of ending up at $u$ having started at $v$ in $\leq t$ steps is \begin{align*} \sum_{i=1}^{n} \lambda_i^t \left<v_i,e_v\right>\left<v_i,e_u\right> \end{align*}

This is just a small sample of the many connections between combinatorial properties of $G$ and algebraic properties of $A$.

2. The Laplacian matrix

Now lift the assumption that $G$ is regular for a while. As it turns out, it's more convenient to consider the following matrix:

Definition 1.

For an undirected graph $G$, the Laplacian matrix of $G$ is \begin{align*} L = D-A \end{align*} where $D$ is matrix with diagonal the vector $(\ldots,\deg(v),\ldots)\in\mathbb{R}^V$

In other words, we have (recall we assume $G$ is loopless, though that's not crucial either) \begin{align*} L_{uv} = \begin{cases} -1, &\text{ if }(u,v)\in E\\ \deg(v), &\text{ if $u=v$}\\ 0, &\text{ otherwise} \end{cases} \end{align*} So as we see, the Laplacian is a close cousin of the adjacency matrix. In fact, when $G$ is regular, $D$ is just a multiple of the identity matrix $I_V$ on $V$, so we have a very good understanding of the operator $D-A$ and in particular its eigenvalues and eigenvectors (since everything is an eigenvector of $I_V$ with value 1).

Some motivation for that definition comes from physics. First, the following observation is in order:

Observation 1.

For any vector $x$, $x^TLx = \sum_{(u,v)\in E}^{} (x_u-x_v)^2$. It's easy to see that if we write \begin{align*} L = \sum_{e\in E}^{} \chi_e\chi_e^T \end{align*} where $\chi_e\in\mathbb{R}^V$ is the indicator' of the edge $(u,v)$, having $-1$ at $u$, $1$ at $v$ and $0$ elsewhere.

If we imagine that each edge is an ideal spring with , then this says that the quadratic form $x^TLx$ associated with the Laplacian gives us the total energy of the system when we place the vertices on the real line according to the vector $x$.

If we imagine that $G$ is a network of 1-Ohm resistors, applying voltages $\varphi$ to some of the vertices, the induced voltages will be trying to minimize the energy of the system, which (remember your physics...) will be again $\sum_{(u,v)\in E}^{} (\varphi_u - \varphi_v)^2$.

Why do we care, well, for lots of reasons. For example, the spring interpretation tells us that if we find two vectors $x,y\in\mathbb{R}^V$ that minimize the energy (subject to some constraints, because we can always trivially minimize by putting all vertices at the same point), and map $G$ into the plane $\mathbb{R}^2$ by $v\mapsto (x_v,y_v)$, we should get a nice picture of the graph $G$, and it's a real thing, people have done it and it's neat.

And here's a fun theorem from 1963 due to Bill Tutte: if we nail down a face of a planar graph and let the springs minimize the energy by figuring out where the other vertices go, we get a planar embedding of the graph.

3. Basic properties of the Laplacian

In plain words, as a linear operator, the Laplacian is a scaled by the degree deviation from the average over the neighbors. In math that is perhaps easier to process, \begin{align*} (Lx)_v = \deg(v)x_v - \sum_{u:(u,v)\in E}^{} x_u = \deg(v) \left(x_v - \frac{1}{\deg v}\sum_{u:(u,v)\in E}^{} x_u\right) \end{align*} This sort of measures the failure of $x$ to be a harmonic function on $G$. The Laplacian is a real symmetric matrix, so by a spectral theorem it eigendecomposes as \begin{align*} L = \sum_{i=1}^{n} \lambda_i v_iv_i^T \end{align*} for an orthonormal eigenbasis $\{v_i\}$. Moreover, it's positive semidefinite, since clearly $x^TLx = \sum_{(u,v)\in E} (x_u-x_v)^2\geq 0$. Thus, $0\leq \lambda_1\leq\ldots\leq \lambda_n$. What's more, if we let $x$ be the all-1s vector, then $Lx = 0$, hence $\lambda_1=0$. But even better,

Proposition 1.

The multiplicity of $0$ in the spectrum of $L$ is the number of connected components of $G$.

Proof 1.

If $G$ has $k$ connected components, consider $k$ independent vectors $u_1,\ldots,u_k$ each of which is constant on each component. Then by the interpretation of the Laplacian as deviation from the average, it's immediate that $Lu_i =0$ for all $i$; so the eigenspace of zero is $\geq k$ dimensional.

Conversely, if the eigenspace has $k$ dimensions, we get $k$ independent vectors $u_1,\ldots,u_k$ which all satisfy \begin{align*} Lu_i = 0 \implies u_i^TLu_i = 0 \implies \sum_{(u,v)\in E}^{} (x_u-x_v)^2=0 \implies x_u=x_v \text{ if }(u,v)\in E \end{align*} i.e. are constant on components. But if there are $<k$ components, we can't get that many such vectors; so there are $\geq k$ components.

However one shouldn't be too impressed by this, or at least one should want more. This is a hard' result, whereas we embedded $G$ in a continuous setting, which should in a natural way allow us to quantify statements of a more qualitative nature, like `$G$ is close to having $k$ connected components'. Spectral graph theory is remarkably successful in that arena, as we will (maybe) see later.

On a somewhat dual note, the spectrum also detect bipartiteness:

Proposition 2.

When $G$ is $d$-regular, we always have $\lambda_n\leq 2d$, and $G$ is bipartite iff $\lambda_n=2d$.

Proof 2.

Suppose $Lx = \lambda x$ is an eigenvector, then if its largest coordinate magnitude is realized at $v$, we have \begin{align*} \lambda \left| x_v \right|= \left| (Lx)_v \right| = \left| dx_v - \sum_{(u,v)\in E}^{} x_u \right|\leq d \left| x_v \right| + \sum_{(u,v)\in E}^{} \left| x_u \right|\leq 2d \left| x_v \right| \end{align*} so we see that $\lambda\leq 2d$. Now supposing that $\lambda_n=2d$, we get a vector $v_n$ where the above inequality is tight, meaning that $(u,v)\in E$ implies $v_u = -v_v$. Then a bipartition is given by the vertices where $v_n$ is positive on one side, and the remainder on the other.

Conversely, if $G$ is bipartite, the vector that is $1$ on one side and $-1$ on the other realizes $\lambda_n=2d$ by pretty much the above reasoning.

Again, the spectrum also detects being almost bipartite. What's more, the high eigenvalues turn out to be useful in detecting $k$-partiteness.

4. Is this the holy grail of graph theory?

No, because there are so-called cospectral graphs, which are graphs which are combinatorially not the same but have the same spectra. Not all information about $G$ is captured by the Laplacian, and sometimes that can play you an unfortunate trick.

Yet, spectral graph theory is remarkably close to the holy grail for many tasks, and has lead to the some of the fastest algorithms for a range of fundamental problems.