The main idea of spectral graph theory is to understand graph data by constructing natural matrix representations and studying their spectrums.

There are many natural datasets that appear naturally as graphs:

  • Social networks like Facebook and Twitter.

  • Citation networks like Google Scholar.

  • Internet graphs like the World Wide Web.

For now, we will assume that a graph \(G=(V,E)\) is undirected and unweighted on \(n\) nodes.

There are two common matrix representations of a graph. The first is an \(n \times n\) adjacency matrix \(\mathbf{A}\) where \(A_{ij} = 1\) if \((i,j) \in E\) and \(A_{ij} = 0\) otherwise. The second is an \(n \times n\) Laplacian matrix \(\mathbf{L} = \mathbf{D} - \mathbf{A}\) where \(\mathbf{D}\) is the diagonal degree matrix with \(D_{ii} = \sum_{j=1}^n A_{ij}\).

It is also common to look at normalized versions of both matrices \[ \bar{\mathbf{A}} = \mathbf{D}^{-1/2} \mathbf{AD}^{-1/2} \qquad \mathbf{L} = \mathbf{I} - \bar{\mathbf{A}}. \]

The adjacency and Laplacian matrices contain a lot of information about the graph.

  • If \(\mathbf{L}\) has \(k\) eigenvalues equal to \(0\), then the graph \(G\) has \(k\) connected components.

  • The sum of cubes of the adjacency matrix’s eigenvalues is equal to the number of triangles in the graph times 6.

  • The sum of eigenvalues to the power \(q\) is proportional to the number of \(q\) cycles.

Today, we’ll see how eigenvectors are useful for clustering and visualizing graphs.

We’ll use the edge-incidence matrix \(\mathbf{B} \in \mathbb{R}^{m \times n}\) where \(m\) is the number of edges in the graph. Consider the edge \((i,j) \in [m]\) and the node \(k \in V\), then \[ B_{(i,j),k} = \begin{cases} 1 & \text{if } k = i \\ -1 & \text{if } k = j \\ 0 & \text{otherwise} \end{cases}. \]

We can write the Laplacian as \[ \mathbf{L} = \mathbf{B}^\top \mathbf{B} = \mathbf{b}_1 \mathbf{b}_1^\top + \mathbf{b}_2 \mathbf{b}_2^\top + \ldots + \mathbf{b}_m \mathbf{b}_m^\top \] where \(\mathbf{b}_i\) is the \(i\)th row of \(\mathbf{B}\) (each row corresponds to a single edge).

From this view, we can conclude that

  • For any vector \(\mathbf{x} \in \mathbb{R}^n\), \[ \mathbf{x}^\top \mathbf{L} \mathbf{x} = \mathbf{x}^\top \mathbf{B}^\top \mathbf{B} \mathbf{x} = \sum_{(i,j) \in E} (x_i - x_j)^2. \]

  • \(\mathbf{L}\) is positive semidefinite since \[ \mathbf{x}^\top \mathbf{L x} = \mathbf{x}^\top \mathbf{B}^\top \mathbf{B} \mathbf{x} = \| \mathbf{B} \mathbf{x} \|_2^2 \geq 0 \] for all \(\mathbf{x}\).

  • \(\mathbf{L} = \mathbf{V \Sigma^2 V}^\top\) where \(\mathbf{U \Sigma V}^\top\) is the SVD of \(\mathbf{B}\). In particular, the columns of \(\mathbf{V}\) are the eigenvectors of \(\mathbf{L}\).

With these observations in mind, consider the function \(f(\mathbf{x}) = \mathbf{x}^\top \mathbf{L x}\) for some vector \(\mathbf{x} \in \mathbb{R}^n\). Notice that \(f(\mathbf{x})\) is small if \(\mathbf{x}\) is smooth with respect to the graph. In terms of our linear algebraic view, if we plug a small eigenvector into \(f\), we get a small value.