text: |

11  Advanced Topics of Linear Maps

The diagonalization process we explored previously—expressing matrices as \(P^{-1}DP\) with eigenvector matrix \(P\) and diagonal eigenvalue matrix \(D\)—takes on deeper meaning when viewed through the lens of linear transformations.

This perspective reveals that the same linear transformation can look different depending on our coordinate system. By changing basis, we gain the flexibility to choose the most convenient representation for a given problem, bridging the gap between abstract properties and concrete matrix representations. This powerful viewpoint is essential in both theoretical contexts and applications ranging from computer graphics to quantum mechanics.

11.1 Representation of Vectors using Bases

When working with a vector space \(V\) (often \(\mathbb{R}^n\)), we can represent vectors using different bases. Let’s explore how the coordinate representation of a vector changes when we switch between bases.

Consider a vector space \(V\) with two different bases:

  • \(S_1 = \{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}\)
  • \(S_2 = \{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_n\}\)

For any vector \(\mathbf{v} \in V\), we can express it as a linear combination using either basis:

Using basis \(S_1\): \[\mathbf{v} = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \ldots + c_n\mathbf{v}_n\]

The coordinate vector with respect to \(S_1\) is: \[[\mathbf{v}]_{S_1} = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}\]

Similarly, using basis \(S_2\): \[\mathbf{v} = d_1\mathbf{w}_1 + d_2\mathbf{w}_2 + \ldots + d_n\mathbf{w}_n\]

The coordinate vector with respect to \(S_2\) is: \[[\mathbf{v}]_{S_2} = \begin{bmatrix} d_1 \\ d_2 \\ \vdots \\ d_n \end{bmatrix}\]

The fundamental question is: What is the relationship between \([\mathbf{v}]_{S_1}\) and \([\mathbf{v}]_{S_2}\)?

Theorem 11.1 Suppose that \(V\) is an \(n\)-dimensional vector space with bases \(S_1\) and \(S_2\). Then there exists a unique invertible matrix \(P=P_{S_1 \leftarrow S_2}\) such that for every vector \(\mathbf{v} \in V\) \[[\mathbf{v}]_{S_1} = P[\mathbf{v}]_{S_2}\]

Moreover, this change of basis matrix can be constructed as: \[P_{S_1 \leftarrow S_2} = \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow\\ [\mathbf{w}_1]_{S_1} & [\mathbf{w}_2]_{S_1} & \cdots & [\mathbf{w}_n]_{S_1} \\ \downarrow & \downarrow & \cdots&\downarrow \end{bmatrix}, \tag{11.1}\] where \(S_2=\{\mathbf{w}_1,\dots,\mathbf{w}_n\}\)

Proof. Let \(\mathbf{v} \in V\) with representation in basis \(S_2\):

\[\mathbf{v} = d_1\mathbf{w}_1 + d_2\mathbf{w}_2 + \ldots + d_n\mathbf{w}_n \quad \text{where} \quad [\mathbf{v}]_{S_2} = \begin{bmatrix} d_1 \\ d_2 \\ \vdots \\ d_n \end{bmatrix}\]

Now, we need to find \([\mathbf{v}]_{S_1}\). To do that we use Theorem 6.1, that states that finding coordinates is a linear map, and Equation 3.1, that allows to express a linear combination in \(\mathbb{R}^n\) as a product of a matrix and a vector: \[\begin{align} [\mathbf{v}]_{S_1} & = [d_1\mathbf{w}_1 + \ldots + d_n\mathbf{w}_n]_{S_1}\\ & = d_1[\mathbf{w}_1]_{S_1} + \ldots + d_n[\mathbf{w}_n]_{S_1}\\ & = \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow\\ [\mathbf{w}_1]_{S_1} & [\mathbf{w}_2]_{S_1} & \cdots & [\mathbf{w}_n]_{S_1} \\ \downarrow & \downarrow & \cdots&\downarrow \end{bmatrix} \begin{bmatrix} d_1 \\ d_2 \\ \vdots \\ d_n \end{bmatrix} \end{align}\] Therefore: \[[\mathbf{v}]_{S_1} = P_{S_1 \leftarrow S_2} [\mathbf{v}]_{S_2},\] and \(P_{S_1\leftarrow S_2}\) is given by Equation 11.1

If \(P=P_{S_1\leftarrow S_2}\) is the change of basis matrix from \(S_2\) to \(S_1\), then \(P^{-1}=P_{S_2\leftarrow S_1}\) is the change of basis matrix from \(S_1\) to \(S_2\). We see this easily: from the equation \([\mathbf{v}]_{S_1} = P[\mathbf{v}]_{S_2}\), we deduce that \[[\mathbf{v}]_{S_2} = P^{-1}[\mathbf{v}]_{S_1}\]

11.1.1 Computational Problems

Theorem 11.1 provides not just a theoretical connection but a concrete algorithm for computing the change of basis matrix \(P\). The formula \([\mathbf{v}]_{S_1}=P[\mathbf{v}]_{S_2}\) looks simple but contains all the steps needed to convert coordinates between different bases. Students should pay close attention to this formula as it shows how to solve many types of coordinate conversion problems in one compact expression.

Find the change of basis matrix

Suppose \(S_1\) and \(S_2\) are given. Find the change of basis matrix from \(S_2\) to \(S_1\), \(P_{S_1\leftarrow S_2}\).

Example: Suppose \(S_1 = \{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) and \(S_2 = \{\mathbf{w}_1, \mathbf{w}_2, \mathbf{w}_3\}\) are bases of \(\mathbb{R}^3\) and we are asked to find \(P_{S_1\leftarrow S_2}\).

From formula Equation 11.1, we know that we need to find \([\mathbf{w}_1]_{S_1}\), \([\mathbf{w}_2]_{S_1}\), and \([\mathbf{w}_3]_{S_1}\).

To find \([\mathbf{w}_i]_{S_1}\), we solve the vector equation: \(x_1 \mathbf{v}_1 + x_2 \mathbf{v}_2 + x_3 \mathbf{v}_3 = \mathbf{w}_i\). We look at the augmented system and we row reduce it: \[ \left[ \begin{array}{ccc|c} \uparrow & \uparrow & \uparrow & \uparrow \\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \mathbf{w}_i \\ \downarrow & \downarrow & \downarrow & \downarrow \end{array}\right] \xrightarrow{\text{RREF}} \left[\begin{array}{ccc|c} 1&0&0&\uparrow \\ 0&1&0 & [\mathbf{w}_i]_{S_1} \\ 0&0&1 &\downarrow \end{array}\right]. \] Since \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) is a basis, the first three columns of the reduced matrix is the identity and the last column gives us the coordinates. We can combine all of them:

\[ \left[ \begin{array}{ccc|ccc} \uparrow & \uparrow & \uparrow & \uparrow & \uparrow & \uparrow\\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \mathbf{w}_1 & \mathbf{w}_2 & \mathbf{w}_3 \\ \downarrow & \downarrow & \downarrow & \downarrow & \downarrow & \downarrow \end{array}\right] \xrightarrow{\text{RREF}} \left[\begin{array}{ccc|ccc} 1&0&0&\uparrow &\uparrow & \uparrow\\ 0&1&0 & [\mathbf{w}_1]_{S_1} & [\mathbf{w}_2]_{S_1} & [\mathbf{w}_3]_{S_1} \\ 0&0&1 &\downarrow & \downarrow & \downarrow \end{array}\right] \] Therefore, in general, to find \(P_{S_1\leftarrow S_2}\) we row reduce the augmented matrix \([S_1|S_2]\) to get \([I|P_{S_1\leftarrow S_2}]\):

\[[S_1|S_2]\xrightarrow{\text{RREF}} [I|P_{S_1\leftarrow S_2}]\]

Where:

  • \(S_1\) is the matrix with columns being the vectors of the first basis
  • \(S_2\) is the matrix with columns being the vectors of the second basis
  • \(P_{S_1\leftarrow S_2}\) is the change of basis matrix from \(S_2\) to \(S_1\)

Find coordinates

Suppose \(S_1\), \(S_2\) and \(P=P_{S_1\leftarrow S_2}\) are given. If we we know the coordinates of a vector \(\mathbf{v}\in V\) with respect to one basis, find the coordinates with respect to the other basis.

These type of problems are simpler. We just need to pay attention if we multiply by \(P\) or by \(P^{-1}\). Let \(\mathbf{v}\in V\). If we know \([\mathbf{v}]_{S_2}\), then we find \([\mathbf{v}]_{S_1}\) using: \[[\mathbf{v}]_{S_1}=P[\mathbf{v}]_{S_2}\] On the other hand, if we know \([\mathbf{v}]_{S_1}\) then we find \([\mathbf{v}]_{S_2}\) using: \[[\mathbf{v}]_{S_2}=P^{-1}[\mathbf{v}]_{S_1}\]

Find elements of a basis

Suppose \(S_1\) and \(P=P_{S_1\leftarrow S_2}\) are given. Find \(S_2\).

Example: Suppose that

  • \(S_1 =\left\{ \begin{bmatrix}1\\1\\1\end{bmatrix}, \begin{bmatrix}1\\-1\\0\end{bmatrix}, \begin{bmatrix}1\\1\\-2\end{bmatrix} \right\}\)
  • \(S_2 =\{\mathbf{w}_1,\mathbf{w}_2, \mathbf{w}_3\}\), and are bases of \(\mathbf{R^3}\) and that \[ P=P_{S_1\leftarrow S_2}=\begin{bmatrix}1&0&0\\1&1&0\\1&1&1\end{bmatrix} \] is the change of basis matrix from \(S_2\) to \(S_1\). Our task is to find the elements of \(S_2\).

From Equation 11.1 we know that \([\mathbf{w}_1]_{S_1}=(1,1,1)\). Then \[\mathbf{w}_1=1\begin{bmatrix}1\\1\\1\end{bmatrix}+1\begin{bmatrix}1\\-1\\0\end{bmatrix}+1\begin{bmatrix}1\\1\\-2 \end{bmatrix}=\begin{bmatrix}3\\1\\-1\end{bmatrix}\] Notice that we can write this as \[\begin{align}\mathbf{w}_1&=1\begin{bmatrix}1\\1\\1\end{bmatrix}+1\begin{bmatrix}1\\-1\\0\end{bmatrix}+1\begin{bmatrix}1\\1\\-2 \end{bmatrix}\\ &=\begin{bmatrix}1&1&1\\1&-1&1\\1&0&2\end{bmatrix}\begin{bmatrix}1\\1\\1\end{bmatrix} =[S_1][\mathbf{w}_1]_{S_1} \end{align}\] where \([S_1]\) is the \(3\times 3\) matrix with columns being the vectors of \(S_1\).

We can find all the vectors at once \[[S_1]P=[S_1] \begin{bmatrix} \uparrow&\uparrow&\uparrow\\ [\mathbf{w}_1]_{S_1}&[\mathbf{w}_2]_{S_1}&[\mathbf{w}_3]_{S_1}\\ \downarrow&\downarrow&\downarrow\\ \end{bmatrix}= \begin{bmatrix} \uparrow &\uparrow &\uparrow\\ \mathbf{w}_1&\mathbf{w}_2&\mathbf{w}_3\\ \downarrow &\downarrow &\downarrow\\ \end{bmatrix}=[S_2]\] and this works in general.

11.2 Matrix Representations of Linear Maps

In Theorem 4.1 we showed that a linear map \(T:\mathbf{R}^n\to\mathbf{R}^n\) can be written as \(T(\mathbf{x})=A\mathbf{x}\), where \(A\) is the matrix \[A=\begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ \downarrow & \downarrow & \cdots & \downarrow \\ \end{bmatrix}\] and \(\{\mathbf{e}_1,\dots\mathbf{e}_n\}\) is the canonical basis for \(\mathbb{R}^n\).

In this section we show that a similar result works for general linear maps on general vector spaces with a fixed basis. The idea of the proof is the same but the result is presented in terms of coordinates.

Theorem 11.2 Suppose that \(V\) is an \(n\) dimensional vector space with basis \(S=\{\mathbf{e}_1,\dots\mathbf{e}_n\}\). If \(T:V\to V\) is a linear map, there exists a unique matrix \(A\) such that for every \(\mathbf{v}\in V\), \[[T(\mathbf{v})]_S=A[\mathbf{v}]_S\] Moreover, the matrix representation of \(T\) with respect to the basis \(S\) is given by \[A = \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow\\ [T(\mathbf{v}_1)]_S & [T(\mathbf{v}_2)]_{S} & \cdots & [T(\mathbf{v}_n)]_{S} \\ \downarrow & \downarrow & \cdots&\downarrow \end{bmatrix}. \tag{11.2}\]

Proof. Let \(\mathbf{v} \in V\) with representation in basis \(S\): \[\mathbf{v} = x_1\mathbf{v}_1 + x_2\mathbf{v}_2 + \ldots + x_n\mathbf{v}_n \quad \text{where} \quad [\mathbf{v}]_{S} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\] Since \(T\) is linear \[ \begin{align}T(\mathbf{v}) &=T(x_1\mathbf{v}_1 + x_2\mathbf{v}_2 + \ldots + x_n\mathbf{v}_n)\\ &= x_1T(\mathbf{v}_1) + x_2T(\mathbf{v}_2) + \ldots + x_nT(\mathbf{v}_n). \end{align} \] Then we find the coordinates of \(T(\mathbf{v})\) using Theorem 6.1 (finding coordinates is a linear map) and Equation 3.1 to find the matrix representation: \[ \begin{align} [T(\mathbf{v})]_S &= [x_1T(\mathbf{v}_1) + x_2T(\mathbf{v}_2) + \ldots + x_nT(\mathbf{v}_n)]_S\\ &=x_1[T(\mathbf{v}_1)]_S + x_2[T(\mathbf{v}_2)]_S + \ldots + x_n[T(\mathbf{v}_n)]_S\\ &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow\\ [T(\mathbf{v}_1)]_{S} & [T(\mathbf{v}_2)]_{S} & \cdots & [T(\mathbf{v}_n)]_{S} \\ \downarrow & \downarrow & \cdots&\downarrow \end{bmatrix} \begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}=A[\mathbf{v}]_S. \end{align} \] This concludes the proof \(\square\)

11.2.1 Computational Problems

Like before, Theorem 11.2 provides not just a theoretical connection but a concrete algorithm for computing the matrix representation \(A\). The formula \([T(\mathbf{v})]_{S}=A[\mathbf{v}]_{S}\) contains a lot of information. Students should pay close attention to it.

Find the matrix representation

Suppose that a linear map \(T:\mathbb{R}^n\to\mathbb{R}^n\) and a basis \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) of \(\mathbf{R}^n\) are given. Find the matrix representation \(A\).

Example: Suppose \(S = \{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) is a basis of \(\mathbb{R}^3\) and that \(T:\mathbb{R}^3\to\mathbb{R}^3\) is a linear map. We are asked to find the matrix representation of \(T\) with respect to \(S\), which we call \(A\).

From Theorem 11.2, we know that we need to find \([T(\mathbf{v}_1)]_{S}\), \([T(\mathbf{v}_2)]_{S}\), and \([T(\mathbf{v}_3)]_{S}\).

To find \([T(\mathbf{v}_i)]_{S}\), we solve the vector equation: \(x_1 \mathbf{v}_1 + x_2 \mathbf{v}_2 + x_3 \mathbf{v}_3 = T(\mathbf{v}_i)\). We look at the augmented system and we row reduce it: \[ \left[ \begin{array}{ccc|c} \uparrow & \uparrow & \uparrow & \uparrow \\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & T(\mathbf{v}_i) \\ \downarrow & \downarrow & \downarrow & \downarrow \end{array}\right] \xrightarrow{\text{RREF}} \left[\begin{array}{ccc|c} 1&0&0&\uparrow \\ 0&1&0 & [T(\mathbf{v}_i)]_{S} \\ 0&0&1 &\downarrow \end{array}\right]. \] Since \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) is a basis, the first three columns of the reduced matrix is the identity and the last column gives us the coordinates of \(T(\mathbf{v}_i)\). We can combine all of them:

\[ \left[ \begin{array}{ccc|ccc} \uparrow & \uparrow & \uparrow & \uparrow & \uparrow & \uparrow\\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & T(\mathbf{v}_1) & T(\mathbf{v}_2) & T(\mathbf{v}_3) \\ \downarrow & \downarrow & \downarrow & \downarrow & \downarrow & \downarrow \end{array}\right] \xrightarrow{\text{RREF}} \left[\begin{array}{ccc|ccc} 1&0&0&\uparrow &\uparrow & \uparrow\\ 0&1&0 & [T(\mathbf{v}_1)]_{S} & [T(\mathbf{v}_2)]_{S} & [T(\mathbf{v}_3)]_{S} \\ 0&0&1 &\downarrow & \downarrow & \downarrow \end{array}\right] \] Therefore, in general, to find \(A\) we row reduce the augmented matrix \([S|T(S)]\) to get \([I|A]\):

\[[S|T(S)]\xrightarrow{\text{RREF}} [I|A]\]

Where:

  • \(S\) is the matrix with columns being the vectors of the basis
  • \(T(S)\) is the matrix with columns being \(T\) applied to the vectors of the basis
  • \(A\) is the matrix representation of \(T\) with respect to \(S\)

Find \(T(\mathbf{v})\)

Suppose \(S\) is a basis of \(\mathbb{R}^n\) and that \(A\) is the matrix representation of a linear map \(T:\mathbb{R}^n\to\mathbb{R}^n\). If \(\mathbf{v}\in\mathbb{R}^n\), we need to find \(T(\mathbf{v})\)

Example: Suppose that \[S =\left\{ \begin{bmatrix}1\\1\\1\end{bmatrix}, \begin{bmatrix}1\\-1\\0\end{bmatrix}, \begin{bmatrix}1\\1\\-2\end{bmatrix} \right\}\] is a basis of \(\mathbf{R^3}\) and that \(T:\mathbb{R}^3\to\mathbb{R}^3\) is a linear map with matrix represenation with respect to \(S\) given by \[ A=\begin{bmatrix}1&0&0\\1&1&0\\1&1&1\end{bmatrix}.\] Let \(\mathbf{v}=(1,1,0)\). We need to find \(T(1,1,0)\).

From Theorem 11.2 we know that \([T(\mathbf{v})]_S=A[\mathbf{v}]_S\). This formula outlines the steps that we need to take. We should not memorize them, we should read them from the formula. We have \(\mathbf{v}\)

  • First, we need to find \([\mathbf{v}]_S\)
  • Then, multiplying \([\mathbf{v}]_S\) by \(A\) we get \([T(\mathbf{v})]_S\)
  • Finally we find \(T(\mathbf{v})\)

To find \([\mathbf{v}]_S\), we look at the augmented system and row reduce it \[\left[\begin{array}{ccc|c} 1&1&1&1\\1&-1&1&1\\1&0&-2&0 \end{array}\right] \xrightarrow{\text{RREF}} \left[\begin{array}{ccc|c} 1&0&0&2\\0&1&0&0\\0&0&1&-1 \end{array}\right] \] and we get that \([(1,1,0)]_S=(2,0,-1)\). Now we multiply the coordinates by \(A\) to find the coordinates of \(T(\mathbf{v})\) \[ [T(\mathbf{v})]_S=A[\mathbf{v}]_S =A\begin{bmatrix}1&0&0\\1&1&0\\1&1&1\end{bmatrix} \begin{bmatrix}2\\0\\-1\end{bmatrix}= \begin{bmatrix}2\\2\\1\end{bmatrix} \] Then \[T(\mathbf{v})= 2 \begin{bmatrix}1\\1\\1\end{bmatrix}+ 2 \begin{bmatrix}1\\-1\\0\end{bmatrix}+ \begin{bmatrix}1\\1\\-2\end{bmatrix} =\begin{bmatrix}5\\1\\0\end{bmatrix} \]

11.2.2 Diagonalization of Linear Transformations

When studying linear transformations on vector spaces, we often seek simple matrix representations. In some cases, we can find bases that yield diagonal matrices, which significantly simplifies calculations. The following Theorem characterizes the transformations that allow for such representations:

I’ll clean up the proof while maintaining its logical structure.

Theorem: A linear transformation \(T: V \to V\) is diagonalizable if and only if there exists a basis for \(V\) consisting entirely of eigenvectors of \(T\).

Proof. (\(\Rightarrow\)) Suppose that the matrix representation of \(T:V\to V\) with respect to the basis \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) is diagonal: \[\begin{bmatrix} λ_1 & 0 & \cdots & 0 \\ 0 & λ_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & λ_n \end{bmatrix}\]

From Equation 11.2, we know that the \(j\)th column of this matrix gives the coordinates of \(T(\mathbf{v}_j)\) in terms of the basis \(S\). Thus, \([T(\mathbf{v}_j)]_S\) has \(λ_j\) in the \(j\)th position and zeros elsewhere, which means \(T(\mathbf{v}_j)=λ_j\mathbf{v}_j\) for each \(j=1,2,\ldots,n\). Therefore, each basis vector \(\mathbf{v}_j\) is an eigenvector of \(T\) with corresponding eigenvalue \(λ_j\).

(\(\Leftarrow\)) Conversely, if \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) is a basis of eigenvectors with \(T(\mathbf{v}_j)=λ_j\mathbf{v}_j\), then the matrix representation with respect to this basis must be diagonal with the eigenvalues on the main diagonal.

11.3 Matrices that Represent the Same Linear Map

Let \(V\) be a vector space with bases \(S_1 = \{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}\) and \(S_2 = \{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_n\}\) and let \(T:V\to V\) be a linear map. From Theorem 11.2 there exist unique matrices \(A\) and \(B\) such that for every \(\mathbf{v}\in V\) \[[T(\mathbf{v})]_{S_1}=A[\mathbf{v}]_{S_1}\quad\text{ and }\quad [T(\mathbf{v})]_{S_2}=B[\mathbf{v}]_{S_2}.\]

Since \(A\) and \(B\) represent the same linear map with respect to a different set of bases, they have to be related. In this section we study this relationship.

From Theorem 11.1 there exists a unique matrix \(P=P_{S_1\leftarrow S_2}\) such that for every \(\mathbf{z}\in V\) \[[\mathbf{z}]_{S_1}=P[\mathbf{z}].\] Applying this formula to \(\mathbf{v}\) and \(T(\mathbf{v})\) and multiplying by \(P^{-1}\) we get: \[\begin{align} [T(\mathbf{v})]_{S_1} &= A[\mathbf{v}]_{S_1}\\ P[T(\mathbf{v})]_{S_2} &= AP[\mathbf{v}]_{S_2}\\ [T(\mathbf{v})]_{S_2} &= P^{-1}AP[\mathbf{v}]_{S_2}. \end{align}\] Since \(B\) is unique we obtain that \[B=P^{-1}AP\quad\text{ or equivalently }\quad A=PBP^{-1} \tag{11.3}\]

This motivates the following definition:

Definition 11.1 Let \(A\) and \(B\) be \(n\times n\) matrices. We say that \(A\) is similar to \(B\) if there exists an invertible matrix \(P\) such that \(B=P^{-1}AP\).

Similarity forms an equivalence relation between square matrices. First, the relation is symmetric: if \(B=P^{-1}AP\), then by multiplying both sides by \(P\) on the right and \(P^{-1}\) on the left, we get \(A=PBP^{-1}\), showing that \(B\) is also similar to \(A\).

The relation is reflexive because every square matrix \(A\) is similar to itself via the identity matrix: \(A=I^{-1}AI\).

Finally, similarity is transitive: if \(A\) is similar to \(B\) with \(B=P^{-1}AP\), and \(B\) is similar to \(C\) with \(C=Q^{-1}BQ\), then substituting gives \(C=Q^{-1}(P^{-1}AP)Q=(PQ)^{-1}A(PQ)\). This shows that \(A\) is similar to \(C\) via the invertible matrix \(PQ\), thereby establishing similarity as an equivalence relation.

Our explanation also provided a proof of the following theorem:

Theorem 11.3 Let \(V\) be a vector space with bases \(S_1\) and \(S_2\) and let \(T:V\to V\) be a linear map. Then the matrix representations of \(T\) with respect to \(S_1\) and \(S_2\) are similar. More precisely, if \(P\) is the change of basis matrix from \(S_2\) to \(S_1\), \(A\) is the matrix representation of \(T\) with respect to \(S_1\) and \(B\) is the matrix representation of \(T\) with respect to \(S_2\), then \(B=P^{-1}AP\).

11.3.1 Rotations in \(\mathbb{R}^3\)

We will illustrate Theorem 11.3 finding matrix representations of 3D rotations.

Let’s start with a simpler example in \(\mathbb{R}^2\) that we have discussed before

Example 1: Let \(T\) be the rotation by \(\theta\) degrees in \(\mathbb{R}^2\). The matrix representation with respect to the canonical basis is \[\begin{bmatrix}\cos(\theta) &-\sin(\theta)\\\sin(\theta)&\cos(\theta)\end{bmatrix}\]

This matrix represents a counterclockwise rotation by \(\theta\) degrees around the origin in a two-dimensional space. When we multiply this matrix by any vector in \(\mathbb{R}^2\), it rotates the vector by the angle \(\theta\) while preserving its length.

The canonical basis in \(\mathbb{R}^2\) consists of the unit vectors \(\mathbf{e}_1 = (1,0)\) and \(\mathbf{e}_2 = (0,1)\). When rotated by \(\theta\) degrees:

  • \(\mathbf{e}_1\) becomes \((\cos(\theta), \sin(\theta))\), which forms the first column of the matrix
  • \(\mathbf{e}_2\) becomes \((-\sin(\theta), \cos(\theta))\), which forms the second column

Example 2: Rotation by \(\theta\) degrees about the x-axis in \(\mathbb{R}^3\) has matrix representation: \[\begin{bmatrix}1 & 0 & 0 \\ 0 & \cos(\theta) & -\sin(\theta) \\ 0 & \sin(\theta) & \cos(\theta)\end{bmatrix}\]

This matrix keeps the x-coordinate unchanged while rotating the yz-plane by angle \(\theta\).

Example 3: Rotation by \(\theta\) degrees about the y-axis in \(\mathbb{R}^3\) has matrix representation: \[\begin{bmatrix}\cos(\theta) & 0 & \sin(\theta) \\ 0 & 1 & 0 \\ -\sin(\theta) & 0 & \cos(\theta)\end{bmatrix}\]

This transformation preserves the y-coordinate while rotating the xz-plane by angle \(\theta\).

Example 4: Let \(T\) be the 3D-rotation by \(\theta\) degrees about the axis \((1,1,1)\). Find the matrix representation of \(T\) with respect to the standard basis.

This is a more challenging problem because we can’t immediately determine how the rotation affects the standard basis vectors. A better approach is to use a change of basis strategy: first select a new coordinate system where one axis aligns with the rotation axis \((1,1,1)\), compute the rotation matrix in this convenient basis, and then transform it back to the standard basis using a change of basis matrix.

In the aligned coordinate system, the rotation will have a simple form since the vector \((1,1,1)\) remains fixed while vectors in the perpendicular plane rotate. We can use the similarity transformation \(A = PBP^{-1}\) where \(P\) is the change of basis matrix, \(B\) represents the rotation in the aligned coordinates, and \(A\) is our desired rotation matrix in the standard basis.

This technique is widely used in computer graphics, where rotations around arbitrary axes are essential for realistic 3D animations, camera movements, and object manipulations. While modern graphics engines often implement these operations using quaternions for better computational efficiency and to avoid issues, the underlying mathematical foundation remains the same.

Step 1: Find the new coordinate system: We first find an orthonormal basis where one vector aligns with the rotation vector \((1,1,1)\): \[S_2=\left\{ \begin{bmatrix}\frac{1}{\sqrt{3}}\\\frac{1}{\sqrt{3}}\\\frac{1}{\sqrt{3}}\end{bmatrix}\\ \begin{bmatrix}\frac{1}{\sqrt{2}}\\\frac{-1}{\sqrt{2}}\\0\end{bmatrix}\\ \begin{bmatrix}\frac{1}{\sqrt{6}}\\\frac{1}{\sqrt{6}}\\\frac{-2}{\sqrt{6}}\end{bmatrix}\\ \right\}\]

In this case we see that the matrix representation of \(T\) with respect \(S_2\) is \[B=\begin{bmatrix}1 & 0 & 0 \\ 0 & \cos(\theta) & -\sin(\theta) \\ 0 & \sin(\theta) & \cos(\theta)\end{bmatrix}\] This matrix keeps the first vector fixed and it rotates the perpendicular plane by \(\theta\) degrees, which is exactly what we want for a rotation around the \((1,1,1)\) axis.

This approach works precisely because \(S_2\)​ is an orthonormal basis. The orthonormality ensures that the transformation preserves angles and distances in the new coordinate system, which is essential for a proper rotation. Without orthonormality, the transformation would introduce distortion, and we could no longer interpret it as a pure rotation.

Step 2: Find the change of basis matrix: We now find \(Q=P_{S_1\leftarrow S_2}\), the change of basis matrix from \(S_2\) to the canonical basis \(S_1=\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3\}\). Since we are transitioning to the canonical basis, for every element of \(S_2\), \([\mathbf{w}_i]_{S_1}=\mathbf{w}_i\) and from Equation 11.1 we see that \[Q=\begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{3}} & \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{3}} & 0 & \frac{-2}{\sqrt{6}}.\\ \end{bmatrix}\]

The orthonormality of \(S_2\) makes the change of basis matrix orthogonal, since all the columns are orthonormal. This means that the inverse of \(Q\) is transpose, which significantly simplifies the calculations when transforming back to the standard basis.

Step 3: Transform back to the standard basis: Theorem 11.3 says that the matrix representation of \(T\) with respect to the canonical basis \(S_1\) is \(A=QBQ^T\): \[A=\begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{3}} & \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{3}} & 0 & \frac{-2}{\sqrt{6}}.\\ \end{bmatrix} \begin{bmatrix}1 & 0 & 0 \\ 0 & \cos(\theta) & -\sin(\theta) \\ 0 & \sin(\theta) & \cos(\theta)\end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} & 0\\ \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{-2}{\sqrt{6}} \end{bmatrix} \] When we simplify this expression using SymPy we get that \[A= \begin{bmatrix}\frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3} & \frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3} & \frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3}\\\frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3} & \frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3} & \frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3}\\\frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3} & \frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3} & \frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3}\end{bmatrix}.\]

This is the SymPy code:

# Import libraries
from sympy import *
init_printing()

# Define the orthonormal vectors of S2
V1=Matrix([1,1,1])/sqrt(3)
V2=Matrix([1,-1,0])/sqrt(2)
V3=Matrix([1,1,-2])/sqrt(6)

# Define the change of basis mstrix Q
Q = Matrix.hstack(V1,V2,V3)

# Define B, matrix representation of T with respect to S2
θ = Symbol('θ')
B = Matrix([[1,0,0],[0,cos(θ),-sin(θ)],[0,sin(θ),cos(θ)]])

# Find the matrix representation with respect to the standard basis and simplify
simplify(Q*B*Q.T)

\(\displaystyle \left[\begin{matrix}\frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3} & \frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3} & \frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3}\\\frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3} & \frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3} & \frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3}\\\frac{1}{3} - \frac{2 \sin{\left(θ + \frac{\pi}{6} \right)}}{3} & \frac{1}{3} - \frac{2 \cos{\left(θ + \frac{\pi}{3} \right)}}{3} & \frac{2 \cos{\left(θ \right)}}{3} + \frac{1}{3}\end{matrix}\right]\)