10 Eigenvalues and Eigenvectors

Watch the following video from the Essence of Linear Algebra, from 3Blue1Brown

Eigenvectors and Eigenvalues

In this section we study eigenvalues and eigenvectors.

10.1 Introduction

Definition 10.1 Let \(A\) be an \(n\times n\) matrix. A non-zero vector \(\mathbf{x}\in\mathbb{R}^n\) is an eigenvector of \(A\) if \(A\mathbf{x}=\lambda\mathbf{x}\) for some \(\lambda\in\mathbb{R}\). A scalar \(\lambda\in\mathbb{R}\) is an eigenvalue of \(A\) if there exists a non-zero vector \(\mathbf{x}\in\mathbb{R}^n\) such that \(A\mathbf{x}=\lambda\mathbf{x}\).

Checking whether a nonzero vector \(\mathbf{x}\) is an eigenvector of a matrix \(A\) is straightforward: we compute \(A\mathbf{x}\) and verify if the result is a scalar multiple of \(\mathbf{x}\).

Determining whether a scalar \(\lambda\) is an eigenvalue requires solving a system of equations. The key observation is that the equation \(A\mathbf{x} = \lambda\mathbf{x}\) is equivalent to
\[ (A - \lambda I)\mathbf{x} = \mathbf{0}, \] where \(I\) is the identity matrix. For \(\lambda\) to be an eigenvalue, this system must have a nonzero solution, meaning \(A - \lambda I\) must be singular. Then we solve:
\[ (A - \lambda I)\mathbf{x} = \mathbf{0}. \tag{10.1}\] If there are infinitely many solutions, \(\lambda\) is an eigenvalue, and all nonzero solutions are eigenvectors. If the system has only the trivial solution \(\mathbf{x} = \mathbf{0}\), then \(\lambda\) is not an eigenvalue.

Exercise: Suppose that \(A=\begin{bmatrix}7&-2\\5&0\end{bmatrix}\), \(\mathbf{x}_1=\begin{bmatrix}2\\1\end{bmatrix}\) and \(\mathbf{x}_2=\begin{bmatrix}1\\1\end{bmatrix}\). Answer the following problems and click to check the answers

Are \(\mathbf{x}_1\) or \(\mathbf{x}_2\) eigenvectors of \(A\)?
Is \(\lambda=2\) an eigenvalue of \(A\)?
Is \(\lambda=3\) an eigenvalue of \(A\)?

Click to see the answer of 1

Remember that we need to check that \(A\mathbf{x}\) is a multiple of \(\mathbf{x}\).

We compute \(A\mathbf{x}_1=\begin{bmatrix}7&-2\\5&0\end{bmatrix}\begin{bmatrix}2\\1\end{bmatrix}=\begin{bmatrix}12\\10\end{bmatrix}\) and see that it is not a multiple of \(\begin{bmatrix}2\\1\end{bmatrix}\). Therefore, \(\mathbf{x}_1\) is not an eigenvector of \(A\).

Now we compute \(A\mathbf{x}_2=\begin{bmatrix}7&-2\\5&0\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix}=\begin{bmatrix}5\\5\end{bmatrix}\) and notice that this is \(5\begin{bmatrix}1\\1\end{bmatrix}\). Therefore \(\mathbf{x}_2\) is an eigenvector of \(A\) with eigenvalue \(5\).

Click to see the answer of 2

From Equation 10.1, we find \(A-2 I=\begin{bmatrix}5&-2\\5&-2\end{bmatrix}\) and solve the augmented system \(\left[\begin{array}{cc|c}5&-2&0\\5&-2&0\end{array}\right]\). This system has infinitely many solutions and we conclude that \(\lambda=2\) is an eigenvalue of \(A\).

The solutions are multiples of \(\begin{bmatrix}\frac{2}{5}\\1\end{bmatrix}\).

Click to see the answer of 3

From Equation 10.1, we find \(A-3 I=\begin{bmatrix}4&-2\\5&-3\end{bmatrix}\) and solve the augmented system \(\left[\begin{array}{cc|c}4&-2&0\\5&-3&0\end{array}\right]\). This system has a unique solution and we conclude that \(\lambda=3\) is not an eigenvalue of \(A\).

10.2 Finding Eigenvalues

To find the eigenvalues of a matrix \(A\), we can use the following chain of equivalent statements:

\[\begin{align} \lambda\text{ is an eigenvalue of }A &\Leftrightarrow A\mathbf{x}=\lambda\mathbf{x}\text{ for some }\mathbf{x}\not=0\\ &\Leftrightarrow A\mathbf{x}-\lambda\mathbf{x}=\mathbf{0}\text{ for some }\mathbf{x}\not=0\\ &\Leftrightarrow (A-\lambda I)\mathbf{x}=\mathbf{0}\text{ for some }\mathbf{x}\not=0\\ &\Leftrightarrow (A-\lambda I) \text{ is not invertible }\\ &\Leftrightarrow \det(A-\lambda I)=0 \\ \end{align} \]

We compute \(\det(A-\lambda I)\), which for an \(n\times n\) matrix is a polynomial of degree \(n\) named the characteristic polynomial of \(A\), and then we find the roots.

Since the characteristic polynomial has degree \(n\), it has at most \(n\) roots. This implies that an \(n\times n\) matrix has at most \(n\) eigenvalues.

If \(A\) is an upper triangular matrix \[A=\begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1n} \\ 0 & a_{22} & a_{23} & \cdots & a_{2n} \\ 0 & 0 & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & a_{nn} \end{bmatrix}\] then \[\det(A-\lambda I)=(a_{11}-\lambda)(a_{22}-\lambda)\cdots (a_{nn}-\lambda)\] and the eigenvalues of \(A\) are the diagonal elements. The same is true if \(A\) is lower triangular.

However, in general, it is not so easy to find the roots of the characteristic polynomial, especially as the size of the matrix increases.

For a \(2 \times 2\) matrix, we can find the roots using the quadratic formula. For a \(3 \times 3\) matrix, we could potentially use the cubic formula, though it’s rather complicated. For a \(4 \times 4\) matrix, there is a quartic formula, but it’s extremely complex and rarely used in practice.

For matrices of size \(5 \times 5\) or larger, the Abel-Ruffini theorem from abstract algebra proves that there is no general algebraic formula for the roots of polynomials of degree five or higher. Therefore, for larger matrices, we typically rely on numerical methods such the power method.

We can also graph the characteristic polynomial to get an idea where the eigenvalues are. Consider

\[A=\begin{bmatrix}0.325 & -0.075 & 0.075 & -0.075\\ 0.025 & 0.225 & -0.025 & -0.275\\0.15 & -0.05 & 0.25 & -0.05\\ -0.1 & -0.1 & 0.1 & 0.4\end{bmatrix} \]

We use sympy to find the characteristic polynomial \[\det(A-\lambda I)= x^{4} - \frac{6 x^{3}}{5} + \frac{49 x^{2}}{100} - \frac{39 x}{500} + \frac{1}{250}\] and then we plot it:

In most cases, we use software like NumPy to find eigenvalues. The command np.linalg.eig(A) returns two objects:

An array containing the eigenvalues of matrix \(A\)
A 2D array where each column represents the eigenvector corresponding to the eigenvalue at the same index

The eigenvectors are normalized to have unit length. If you need only the eigenvalues, you can use the more efficient np.linalg.eigvals(A) function.

Let’s look at the eigenvalues of the matrix \(A\) defined above

# Calculate eigenvalues and eigenvectors of matrix A
eigenvalues, eigenvectors = np.linalg.eig(A)

# Display the eigenvalues
print("Eigenvalues of A:",eigenvalues)

Eigenvalues of A: [0.1 0.2 0.4 0.5]

Now let’s enhance the eigenvector visualization with better formatting:

from sympy import *
init_printing()
Matrix(np.round(eigenvectors,4))

\(\displaystyle \left[\begin{matrix}0.4082 & 0.0 & 0.7071 & 0.0\\0.8165 & -0.7071 & 0.0 & -0.7071\\0.0 & -0.7071 & 0.7071 & 0.0\\0.4082 & 0.0 & 0.0 & 0.7071\end{matrix}\right]\)

Exercise: Check that the columns of eigenvectors are eigenvectors of \(A\) corresponding to the eigenvalues of eigenvalues

A Note of Caution About Numerical Calculations

When using NumPy to calculate eigenvalues and eigenvectors, it’s good to double-check the results, especially for certain types of matrices. For example, with a matrix like \[\begin{bmatrix}1&1\\0&1\end{bmatrix},\] we know that it has only one eigenvalue which is 1, but it has only one eigenvector up to a non-zero multiple Example 10.2. However, NumPy will try to give us two eigenvectors anyway, and the second one isn’t really an new eigenvector.

eigenvalues, eigenvectors = np.linalg.eig(np.array([[1,1],[0,1]]))

print("The eigenvalues are:",eigenvalues)

print("The eigenvectors are:","\n", eigenvectors)

The eigenvalues are: [1. 1.]
The eigenvectors are: 
 [[ 1.00000000e+00 -1.00000000e+00]
 [ 0.00000000e+00  2.22044605e-16]]

This happens because NumPy uses numerical methods that sometimes have to make approximations. To check if the results make sense, you can always verify whether \(A\mathbf{v} = \lambda\mathbf{v}\) holds true for each eigenvector and if the eigenvectors are linearly independent. In this example, the second eigenvector is \((-1,0)\) which is a multiple of the first eigenvector \((1,0)\).

10.3 Examples

In this subsection we find several examples of eigenvalues and eigenvectors

10.3.1 Do eigenvalues always exist?

The answer is no when working in the real number system. However, in the complex number system, the answer is yes. This is guaranteed by the fundamental theorem of algebra, which states that all non-constant polynomials with complex coefficients have at least one complex root.

Example 10.1 A simple example of a matrix without real eigenvalues is the rotation matrix in \(\mathbb{R}^2\) that rotates vectors by \(\theta\) degrees. This matrix is represented as: \[\begin{bmatrix}\cos(\theta) & -\sin(\theta)\\\sin(\theta)&\cos(\theta) \end{bmatrix}\]

When \(\theta\) is not a multiple of \(180°\), this matrix has no real eigenvalues, only complex ones.

The characteristic polynomial of this matrix is \[\lambda^2-2\cos(\theta)+1\] and the quadratic formula tells us that the eigenavalues are \[\lambda = \cos(\theta)\pm i\sin(\theta),\] which is a real number only if \(\sin(\theta)=0\), which happens precisely when \(\theta\) is a multiple of \(180°\).

10.3.2 Algebraic vs. Geometric Multiplicity of Eigenvalues

The following example is instructive.

Example 10.2 Let \(A=\begin{bmatrix}1&1\\0&1\end{bmatrix}\). We see that \(A\) has only one eigenvalue \(\lambda=1\) (appearing with algebraic multiplicity 2, since the characteristic polynomial is \((1-\lambda)^2\)).

When we find the eigenvectors associated with \(\lambda=1\) by solving \((A-\lambda I)\mathbf{v} = \mathbf{0}\):

\[(A-I)\mathbf{v} = \begin{bmatrix}0&1\\0&0\end{bmatrix}\begin{bmatrix}v_1\\v_2\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}\]

We get \(v_2 = 0\) with \(v_1\) free to take any value. Thus, all eigenvectors are scalar multiples of \(\begin{bmatrix}1\\0\end{bmatrix}\).

Despite having an eigenvalue with algebraic multiplicity 2, we can only find one linearly independent eigenvector. The geometric multiplicity (dimension of the eigenspace) is only 1, which is less than the algebraic multiplicity.

Let’s look at a couple of examples

Example 10.3 Consider the 3×3 matrix: \[A=\begin{bmatrix}3 & -1 & 3\\1 & 1 & 3\\1 & -1 & 5\end{bmatrix}\]

To diagonalize \(A\), we first need to find its eigenvalues and eigenvectors. We’ll use SymPy to calculate the characteristic polynomial:

# import required functions from SymPy
from sympy import Matrix, Symbol, eye, det, factor

# Define the matrix A
A = Matrix([[3,-1,3],[1,1,3],[1,-1,5]]) 

# Define the symbol for eigenvalues 
x = Symbol('x') 

# Calculate the characteristic polynomial and factored form
print("The characteristic polynomial of A and its factored version are:")
char_poly = det(A-x*eye(3)) #eye(3) is the 3x3 identity matrix
char_poly,factor(char_poly)

The characteristic polynomial of A and its factored version are:

\(\displaystyle \left( - x^{3} + 9 x^{2} - 24 x + 20, \ - \left(x - 5\right) \left(x - 2\right)^{2}\right)\)

The output tells us the eigenvalues are \(\lambda = 5\) (with multiplicity 1) and \(\lambda = 2\) (with multiplicity 2).

Next we find the eigenvectors associated with each eigenvalue. For \(\lambda = 2\), we find a basis for the null space of \((A-2I)\):

# Find eigenvectors for λ = 2
print("Eigenvectors for λ = 2:")
(A-2*eye(3)).nullspace()

Eigenvectors for λ = 2:

\(\displaystyle \left[ \left[\begin{matrix}1\\1\\0\end{matrix}\right], \ \left[\begin{matrix}-3\\0\\1\end{matrix}\right]\right]\)

For \(\lambda = 5\), we find a basis of \((A- 5I)\):

# Find eigenvectors for λ = 5
print("Eigenvectors for λ = 5:")
(A-5*eye(3)).nullspace()

Eigenvectors for λ = 5:

\(\displaystyle \left[ \left[\begin{matrix}1\\1\\1\end{matrix}\right]\right]\)

From these calculations, we obtain three linearly independent eigenvectors: \[\left\{\begin{bmatrix}1\\1\\0\end{bmatrix}, \begin{bmatrix}-3\\0\\1\end{bmatrix}, \begin{bmatrix}1\\1\\1\end{bmatrix}\right\}\] that form a basis for \(\mathbb{R}^3\).

Let’s look at another example with two eigenvalues

Example 10.4 Consider the 3×3 matrix: \[A = \begin{bmatrix}2 & 4 & 3\\-4 & -6 & -3\\3 & 3 & 1\end{bmatrix}\]

Using sympy, we find that the characteristic polynomial of \(A\) is: \[\det(A-\lambda I)=-\lambda^3-3\lambda^2+4=-(\lambda-1)(\lambda+2)^2\]

This tells us the eigenvalues are \(\lambda = 1\) (with multiplicity 1) and \(\lambda = -2\) (with multiplicity 2).

To find the eigenvectors for \(\lambda = -2\), we find a basis for \(\text{nul}(A+2I)\): \(\begin{bmatrix}-1\\1\\0\end{bmatrix}\).

For the eigenvalue \(\lambda = 1\), a basis for \(\text{nul}(A-1I)\): \(\begin{bmatrix}1\\-1\\1\end{bmatrix}\).

In this case we only got two linearly independent eigenvectors and we didn’t have enough eigenvectors to get a basis of \(\mathbb{R}^3\).

10.4 Diagonalization

We now explore one of the most significant applications of eigenvalues and eigenvectors: determining when a matrix has enough eigenvectors to form a basis of \(\mathbb{R}^n\).

Definition 10.2 An \(n\times n\) matrix \(A\) is said to be diagonalizable if there exists a basis of \(\mathbb{R}^n\) consisting of eigenvector of \(A\).

A key result supporting this definition is the following theorem:

Theorem 10.1 Let \(A\) be an \(n\times n\) matrix. Suppose that \(A\) has distinct eigenvalues \(\lambda_1,\lambda_2,\dots,\lambda_k\) with corresponding eigenvectors \(\mathbf{v}_1,\mathbf{v}_2,\dots,\mathbf{v}_k\). Then the set of eigenvectors \(\{\mathbf{v}_1,\mathbf{v}_2,\dots,\mathbf{v}_k\}\) is linearly independent.

Proof. We proceed by contradiction. Suppose that the eigenvalues are distinct but the set of eigenvectors \(\{\mathbf{v}_1,\mathbf{v}_2,\dots,\mathbf{v}_k\}\) is linearly dependent.

Let \(j\) be the smallest index such that \(\mathbf{v}_j\) can be expressed as a linear combination of the previous eigenvectors. This means \(\{\mathbf{v}_1,\mathbf{v}_2,\dots,\mathbf{v}_{j-1}\}\) is linearly independent, but \(\mathbf{v}_j\) can be written as: \[\mathbf{v}_j = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_{j-1}\mathbf{v}_{j-1}\] for some scalars \(c_1, c_2, \dots, c_{j-1}\).

Now, applying \(A\) to both sides of this equation: \[\begin{align} A\mathbf{v}_j &= A(c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_{j-1}\mathbf{v}_{j-1}) \\ &= c_1 A\mathbf{v}_1 + c_2 A\mathbf{v}_2 + \cdots + c_{j-1} A\mathbf{v}_{j-1} \\ &= c_1 \lambda_1 \mathbf{v}_1 + c_2 \lambda_2 \mathbf{v}_2 + \cdots + c_{j-1} \lambda_{j-1} \mathbf{v}_{j-1} \end{align}\]

Since \(\mathbf{v}_j\) is an eigenvector of \(A\) with eigenvalue \(\lambda_j\), we also have: \[A\mathbf{v}_j = \lambda_j \mathbf{v}_j = \lambda_j(c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_{j-1}\mathbf{v}_{j-1})\]

Comparing these two expressions for \(A\mathbf{v}_j\) and subtracting: \[\begin{align} \mathbf{0} &= c_1\lambda_1\mathbf{v}_1 + c_2\lambda_2\mathbf{v}_2 + \cdots + c_{j-1}\lambda_{j-1}\mathbf{v}_{j-1} - \lambda_j(c_1\mathbf{v}_1 + \cdots + c_{j-1}\mathbf{v}_{j-1}) \\ &= c_1(\lambda_1 - \lambda_j)\mathbf{v}_1 + c_2(\lambda_2 - \lambda_j)\mathbf{v}_2 + \cdots + c_{j-1}(\lambda_{j-1} - \lambda_j)\mathbf{v}_{j-1} \end{align}\]

Since \(\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_{j-1}\}\) is linearly independent, each coefficient must be zero: \[c_i(\lambda_i - \lambda_j) = 0 \quad \text{for all } i \in \{1, 2, \dots, j-1\}\]

But the eigenvalues are distinct, so \(\lambda_i - \lambda_j \neq 0\) for all \(i \neq j\). This forces \(c_i = 0\) for all \(i \in \{1, 2, \dots, j-1\}\). Therefore, \(\mathbf{v}_j = \mathbf{0}\), which contradicts the definition of an eigenvector. Hence, the set of eigenvectors must be linearly independent. \(\square\)

This leads to an immediate corollary

Corollary: An \(n\times n\) matrix with \(n\) different eigenvalues is diagonalizable.

10.4.1 The Diagonalization Formula

Suppose that the \(n\times n\) matrix \(A\) has a basis of eigenvectors \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) with corresponding eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\), such that \(A\mathbf{v}_i=\lambda_i\mathbf{v}_i\) for each \(i\).

Let \(\mathbf{x}\in\mathbb{R}^n\). We can express \(\mathbf{x}\) as a linear combination of the eigenvectors in \(S\): \[\begin{align} \mathbf{x}&=c_1\mathbf{v}_1+c_2\mathbf{v}_2+\cdots+c_n\mathbf{v}_n\\ &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix}\begin{bmatrix}c_1\\c_2\\\vdots\\c_n\end{bmatrix}=P\mathbf{c} \end{align}\]

where \(P\) is the \(n\times n\) matrix with the eigenvectors as columns and \(\mathbf{c}=[\mathbf{x}]_S\) is the coordinate vector of \(\mathbf{x}\) with respect to basis \(S\). Since the columns of \(P\) are linearly independent, then \(P\) is invertible and \(\mathbf{c}=[\mathbf{x}]_S=P^{-1}\mathbf{x}\).

When we apply \(A\) to \(\mathbf{x}\), we get:

\[\begin{align} A\mathbf{x}&=A(c_1\mathbf{v}_1+c_2\mathbf{v}_2+\cdots+c_n\mathbf{v}_n)\\ &=c_1A\mathbf{v}_1+c_2A\mathbf{v}_2+\cdots+c_nA\mathbf{v}_n\\ &=c_1\lambda_1\mathbf{v}_1+c_2\lambda_2\mathbf{v}_2+\cdots+c_n\lambda_n\mathbf{v}_n\\ &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix}\begin{bmatrix}\lambda_1c_1\\\lambda_2c_2\\\vdots\\\lambda_nc_n\end{bmatrix}\\ &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \begin{bmatrix}\lambda_1&0&\cdots&0\\ 0&\lambda_2&\cdots&0\\\vdots&\vdots&\ddots&\vdots\\0&0&\cdots&\lambda_n\end{bmatrix} \begin{bmatrix}c_1\\c_2\\\vdots\\c_n\end{bmatrix}\\ &= P\Lambda\mathbf{c} = P\Lambda[\mathbf{x}]_S\\ \end{align}\] where \(\Lambda\) is the \(n\times n\) diagonal matrix of eigenvalues.

Combining both formulas, we get that for any \(\mathbf{x}\in\mathbb{R}^n\): \[AP[\mathbf{x}]_S=A\mathbf{x}=P\Lambda[\mathbf{x}]_S.\]

Since this holds for any vector \(\mathbf{x}\), and \([\mathbf{x}]_S\) represents any vector in \(\mathbb{R}^n\) (as \(S\) is a basis), we conclude that: \[AP=P\Lambda.\]

This important formula can be rewritten as: \[A=P\Lambda P^{-1}\quad\text{or equivalently}\quad P^{-1}AP=\Lambda.\]

This formula provides remarkable computational advantages when working with powers of \(A\). For example, computing the cube of \(A\): \[\begin{align} A^3 &= A \cdot A \cdot A \\ &= (P\Lambda P^{-1})(P\Lambda P^{-1})(P\Lambda P^{-1}) \\ &= P\Lambda P^{-1}P\Lambda P^{-1}P\Lambda P^{-1} \\ &= P\Lambda \underbrace{P^{-1}P}_{=I} \Lambda \underbrace{P^{-1}P}_{=I} \Lambda P^{-1} \\ &= P\Lambda \cdot I \cdot \Lambda \cdot I \cdot \Lambda P^{-1} \\ &= P\Lambda \Lambda \Lambda P^{-1} \\ &= P\Lambda^3 P^{-1} \end{align}\]

More generally, for any positive integer \(k\): \[A^k = P\Lambda^k P^{-1}\]

where \(\Lambda^k\) is the diagonal matrix with entries \(\lambda_i^k\): \[\Lambda^k = \begin{bmatrix} \lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k \end{bmatrix}\]

This result dramatically simplifies calculations involving matrix powers, making diagonalization a powerful technique in linear algebra.

10.4.2 The Spectral Theorem

While the diagonalization formula \(A = P\Lambda P^{-1}\) applies when a matrix has a complete set of eigenvectors, the spectral theorem provides an even more powerful result for a special class of matrices: symmetric matrices.

Recall that a matrix \(A\) is symmetric if \(A = A^T\). The spectral theorem states that every symmetric matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable in a very strong sense:

\(A\) has \(n\) real eigenvalues (counting multiplicities)
\(A\) has an orthonormal basis of eigenvectors
\(A\) can be orthogonally diagonalized as \(A = QDQ^T\) where \(Q\) is an orthogonal matrix (\(Q^T = Q^{-1}\))

Eigenvectors of symmetric matrices corresponding to distinct eigenvalues are orthogonal, which helps explain why the spectral theorem works.

Theorem: Let \(A\) be a symmetric matrix. If \(\lambda_i\) and \(\lambda_j\) are distinct eigenvalues with corresponding eigenvectors \(\mathbf{v}_i\) and \(\mathbf{v}_j\), then \(\mathbf{v}_i \cdot \mathbf{v}_j = 0\).

Proof: \[\begin{align} \lambda_i(\mathbf{v}_i \cdot \mathbf{v}_j) &= \lambda_i\mathbf{v}_i^T\mathbf{v}_j = (A\mathbf{v}_i)^T\mathbf{v}_j = \mathbf{v}_i^TA^T\mathbf{v}_j\\ &= \mathbf{v}_i^TA\mathbf{v}_j = \mathbf{v}_i^T(\lambda_j\mathbf{v}_j) = \lambda_j(\mathbf{v}_i \cdot \mathbf{v}_j) \end{align}\]

Since \(\lambda_i \neq \lambda_j\), we must have \(\mathbf{v}_i \cdot \mathbf{v}_j = 0\). \(\square\)

If we select orthogonal eigenvectors of a symmetric matrix \(A\), and we normalize them we obtain an orthonormal basis \(\{\mathbf{q}_1, \mathbf{q}_2, \ldots, \mathbf{q}_n\}\) where:

\(\|\mathbf{q}_i\| = 1\) for all \(i\)
\(\mathbf{q}_i \cdot \mathbf{q}_j = 0\) for all \(i \neq j\)
\(A\mathbf{q}_i = \lambda_i\mathbf{q}_i\) for all \(i\)

Let \(Q = [\mathbf{q}_1 \; \mathbf{q}_2 \; \cdots \; \mathbf{q}_n]\) be the matrix with these orthonormal eigenvectors as columns. Then \(Q\) is an orthogonal matrix, meaning \(Q^TQ = QQ^T = I\), and thus \(Q^{-1} = Q^T\).

The spectral decomposition of \(A\) is then: \[A = QDQ^T \tag{10.2}\]

where \(D\) is the diagonal matrix with eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\) on the diagonal.

We can also write the spectral decomposition in the following form: \[A = \lambda_1\mathbf{q}_1\mathbf{q}_1^T + \lambda_2\mathbf{q}_2\mathbf{q}_2^T + \cdots + \lambda_n\mathbf{q}_n\mathbf{q}_n^T = \sum_{i=1}^n \lambda_i\mathbf{q}_i\mathbf{q}_i^T\]

This representation expresses \(A\) as a sum of rank-one matrices, each scaled by an eigenvalue. This form is particularly useful for applications in principal component analysis, computer graphics, and quantum mechanics.

10.4.3 Finding the spectral decomposition in NumPy

The spectral theorem guarantees that every symmetric matrix can be orthogonally diagonalized. In practical applications, we need efficient and reliable computational methods to find this decomposition. NumPy provides exactly what we need through its linear algebra module.

We use the command np.linalg.eigh() in NumPy to find the eigenvalues and eigenvectors of symmetric matrices. Symmetric matrices are also called Hermitian (in the real case), and the “h” in “eigh” stands for that. This function is specifically optimized for symmetric matrices and guarantees real eigenvalues and orthogonal eigenvectors, while np.linalg.eig() is the general function for non-symmetric matrices that doesn’t ensure these properties.

Let’s implement the spectral decomposition step by step:

import numpy as np

# Create a symmetric matrix
A = np.array([
    [4, 1, 1],
    [1, 3, 2],
    [1, 2, 6]
])

print("Original symmetric matrix A:")
print(A)
print()

# Check that A is symmetric
print("Is A symmetric?", np.allclose(A, A.T))
print()

Original symmetric matrix A:
[[4 1 1]
 [1 3 2]
 [1 2 6]]

Is A symmetric? True

The spectral theorem only applies to symmetric matrices, so it’s always good practice to verify symmetry before proceeding. The np.allclose() function accounts for potential floating-point errors when checking equality.

Now we find the eigenvalues and eigenvectors of \(A\). The spectral theorem guarantees we’ll get real eigenvalues and a complete set of orthogonal eigenvectors:

# Find eigenvalues and eigenvectors using NumPy
eigenvalues, eigenvectors = np.linalg.eigh(A)  # eigh is for symmetric matrices

print("Eigenvalues:", eigenvalues)
print("\nEigenvectors (as columns):")
print(eigenvectors)

Eigenvalues: [1.88646239 3.59642438 7.51711323]

Eigenvectors (as columns):
[[ 0.24556026 -0.90025898 -0.35949122]
 [-0.89393687 -0.06686651 -0.44317687]
 [ 0.37493604  0.43018908 -0.82119445]]

Notice that in NumPy, the eigenvectors are returned as columns of the matrix. The eigh function automatically normalizes the eigenvectors to have unit length, forming an orthonormal basis. By the spectral theorem, these eigenvectors form an orthogonal matrix \(Q\).

A key property of orthogonal matrices is that their transpose equals their inverse: \(Q^T = Q^{-1}\). Let’s verify this property:

# Verify orthogonality of eigenvectors
Q = eigenvectors
print("Q^T Q = (should be identity)")
print(np.round(Q.T @ Q, 10))  # Should be identity

Q^T Q = (should be identity)
[[ 1.  0. -0.]
 [ 0.  1.  0.]
 [-0.  0.  1.]]

The result should be the identity matrix (within numerical precision), confirming the orthogonality of our eigenvectors.

For the spectral decomposition, we need a diagonal matrix \(D\) containing the eigenvalues:

# Create diagonal matrix of eigenvalues
D = np.diag(eigenvalues)
print("Diagonal matrix of eigenvalues D:")
print(D)

Diagonal matrix of eigenvalues D:
[[1.88646239 0.         0.        ]
 [0.         3.59642438 0.        ]
 [0.         0.         7.51711323]]

Now we can reconstruct the original matrix using the spectral decomposition formula \(A = QDQ^T\):

# Reconstruct A using spectral decomposition: A = Q D Q^T
A_reconstructed = Q @ D @ Q.T
print("Reconstructed A from spectral decomposition:")
print(np.round(A_reconstructed, 10))  # Round to handle floating-point errors
print()

# Verify reconstruction error
error = np.max(A - A_reconstructed)
print(f"Reconstruction error: {error:.2e}")
print()

Reconstructed A from spectral decomposition:
[[4. 1. 1.]
 [1. 3. 2.]
 [1. 2. 6.]]

Reconstruction error: 5.33e-15

The reconstruction error should be extremely small (near machine precision), confirming that our spectral decomposition is correct.