4 Linear Transformations in \(\mathbb{R}^n\)

4.1 Motivating Example

If \(A\) is an \(m\times n\) matrix, Equation 3.1 tells us that \(A\) induces a map \(A:\mathbb{R}^n\to\mathbb{R}^m\) defined by the Matrix-Vector formula \(A\mathbf{x}\). This map has two important properties:

For every \(\mathbf{x},\mathbf{y}\in\mathbb{R}^n, A(\mathbf{x}+\mathbf{y})=A\mathbf{x}+A\mathbf{y}\), and
For every \(\mathbf{x}\in\mathbb{R}^n\) and every \(c\in\mathbb{R}\), \(A(c\mathbf{x})=cA\mathbf{x}\)

These properties combined yield a more general result. For any vectors \(\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_k \in \mathbb{R}^n\) and scalars \(c_1, c_2, \ldots, c_k \in \mathbb{R}\): \[A(c_1\mathbf{x}_1+c_2\mathbf{x}_2+\cdots+c_k\mathbf{x}_k)=c_1(A\mathbf{x}_1)+c_2(A\mathbf{x}_2)+\cdots+c_k(A\mathbf{x}_k) \tag{4.1}\]

Let’s check 1: Let \(\mathbf{x}=(x_1,\dots,x_n)\) and \(\mathbf{y}=(y_1,\dots,y_n)\) be two arbtrary elements of \(\mathbb{R}^n\). Then writing the product in terms of the columns of \(A\) and using standard operations of vectors we get: \[\begin{aligned} A(\mathbf{x}+\mathbf{y}) &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix}\left( \begin{bmatrix}x_1\\x_2\\\vdots\\x_n \end{bmatrix} + \begin{bmatrix}y_1\\y_2\\\vdots\\y_n \end{bmatrix} \right) \\ &=\begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \begin{bmatrix}x_1+y_1\\x_2+y_2\\\vdots\\x_n+y_n \end{bmatrix} \\ &= (x_1+y_1)\mathbf{c}_1+\cdots +(x_n+y_n)\mathbf{c}_n\\ &=(x_1\mathbf{c}_1+\cdots +x_n\mathbf{c}_n)+(y_1\mathbf{c}_1+\cdots + y_n\mathbf{c}_n)\\ &=A\mathbf{x}+A\mathbf{y}. \end{aligned} \]

Exercise: Check property 2. That is, for every \(\mathbf{x}\in\mathbb{R}^n\) and every \(c\in\mathbb{R}\), \(A(c\mathbf{x})=cA\mathbf{x}\).

Click to see the proof

Let \(\mathbf{x}=(x_1,\dots,x_n)\) be an arbitrary element in \(\mathbb{R}^n\) and let \(c\in\mathbb{R}\) be a scalar. Writing the product in terms of the columns of \(A\) and using standard operations on vectors we get: \[ \begin{aligned} A(c\mathbf{x}) &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \left( c\begin{bmatrix}x_1\\ x_2\\\vdots\\ x_n \end{bmatrix} \right) \\ &= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix}\left( \begin{bmatrix}cx_1\\ cx_2\\ \vdots\\ cx_n \end{bmatrix} \right) \\ &= (cx_1)\mathbf{c}_1+\cdots +(cx_n)\mathbf{c}_n\\ &= c(x_1\mathbf{c}_1+\cdots +x_n\mathbf{c}_n)\\ &= c(A\mathbf{x}) \end{aligned} \]

As a consequence of these properties we obtain

Lemma 4.1 Suppose that \(A\) is an \(m\times n\) matrix and that \(B\) is an \(n\times p\) matrix. Then for every \(\mathbf{x}\in\mathbb{R}^p\), \[(AB)\mathbf{x}=A(B\mathbf{x})\]

Proof. Express \(B\) in terms of its columns: \(B=\begin{bmatrix}\mathbf{b}_1&\mathbf{b}_2&\cdots&\mathbf{b}_p\end{bmatrix}\), where each \(\mathbf{b}_i\) is an \(n\)-dimensional column vector. By Equation 3.4, the product \(AB\) is defined by: \[ AB=\begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ A\mathbf{b}_1 & A\mathbf{b}_2 & \cdots & A\mathbf{b}_p \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix} \] Now, for any vector \(\mathbf{x}=(x_1,\ldots,x_p)\in\mathbb{R}^p\), we use Equation 3.1 to compute \[\begin{align} (AB)\mathbf{x}&=x_1(A\mathbf{b}_1)+x_2(A\mathbf{b}_2)+\cdots+x_p(A\mathbf{b}_p),\quad\text{and}\\ B\mathbf{x}&=x_1\mathbf{b}_1+x_2\mathbf{b}_2+\cdots+x_p\mathbf{b}_p. \end{align}\] Therefore, using Equation 4.1 we obtain \[\begin{align*} A(B\mathbf{x}) &= A(x_1\mathbf{b}_1+x_2\mathbf{b}_2+\cdots+x_p\mathbf{b}_p) \\ &= x_1(A\mathbf{b}_1)+x_2(A\mathbf{b}_2)+\cdots+x_p(A\mathbf{b}_p) \\ &= (AB)\mathbf{x}\quad\square \end{align*}\]

4.2 Linear Transformations

Definition: A function \(T:\mathbb{R}^n\to\mathbb{R}^m\) is a linear transformation if it satisfies two properties:

Additivity: For every \(\mathbf{x}_1,\mathbf{x}_2\in\mathbb{R}^n, T(\mathbf{x}_1+\mathbf{x}_2)=T(\mathbf{x}_1)+T(\mathbf{x}_2)\), and
Scalar Multiplication: For every \(\mathbf{x}\in\mathbb{R}^n\) and every \(c\in\mathbb{R}\), \(T(c\mathbf{x})=cT(\mathbf{x})\)

These properties naturally extend to any finite collection of vectors. For vectors \(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_k\in\mathbb{R}^n\) and scalars \(c_1,c_2,\ldots,c_k\in\mathbb{R}\), we have \[T(c_1\mathbf{x}_1+\cdots+c_k\mathbf{x}_k)=c_1T(\mathbf{x}_1)+\cdots+c_kT(\mathbf{x}_k) \tag{4.2}\] This is an important formula that we will use many times.

Notice that we just establihed that an \(m\times n\) matrix \(A\) induces a linear transformation \(A:\mathbb{R}^n\to\mathbb{R}^m\) defined by \(A\mathbf{x}\). In this section we will demonstrate the converse: that any linear transformation from \(\mathbb{R}^n\) to \(\mathbb{R}^m\) can be expressed as matrix multiplication. The following simple example will illustrate this fundamental property.

Example: Let \(T:\mathbb{R}^2\to\mathbb{R}^3\) be defined by \(T\left( \begin{bmatrix}a\\ b\end{bmatrix} \right)=\begin{bmatrix}a+b\\ b-2a\\ a\end{bmatrix}\)

The first step is to show that the function is a linear transformation. Try to do it using the definition, but feel free to click to see the detailed proof.

Exercise: Prove that \(T:\mathbb{R}^2\to\mathbb{R}^3\) is a linear transformation.

Click to see the proof

Proof. To prove that \(T\) is a linear transformation, we must verify both properties from the definition:

Additivity: \(T(\mathbf{x}+\mathbf{y})=T(\mathbf{x})+T(\mathbf{y})\) for all \(\mathbf{x},\mathbf{y}\in\mathbb{R}^2\)
Scalar multiplication: \(T(c\mathbf{x})=cT(\mathbf{x})\) for all \(\mathbf{x}\in\mathbb{R}^2\) and \(c\in\mathbb{R}\)

Property 1 (Additivity): Let \(\mathbf{x}=\begin{bmatrix}x_1\\ x_2\end{bmatrix}\) and \(\mathbf{y}=\begin{bmatrix}y_1\\ y_2\end{bmatrix}\) be arbitrary vectors in \(\mathbb{R}^2\).

First, let’s compute \(T(\mathbf{x}+\mathbf{y})\): \[\begin{align*} T(\mathbf{x}+\mathbf{y}) &= T\left(\begin{bmatrix}x_1+y_1\\ x_2+y_2\end{bmatrix}\right)\\ &= \begin{bmatrix}(x_1+y_1)+(x_2+y_2)\\ (x_2+y_2)-2(x_1+y_1)\\ x_1+y_1\end{bmatrix}\\ &= \begin{bmatrix}x_1+x_2+y_1+y_2\\ x_2+y_2-2x_1-2y_1\\ x_1+y_1\end{bmatrix} \end{align*}\]

Now, let’s compute \(T(\mathbf{x})+T(\mathbf{y})\): \[\begin{align*} T(\mathbf{x})+T(\mathbf{y}) &= \begin{bmatrix}x_1+x_2\\ x_2-2x_1\\ x_1\end{bmatrix} + \begin{bmatrix}y_1+y_2\\ y_2-2y_1\\ y_1\end{bmatrix}\\ &= \begin{bmatrix}x_1+x_2+y_1+y_2\\ x_2-2x_1+y_2-2y_1\\ x_1+y_1\end{bmatrix} \end{align*}\]

Since these are equal for any choice of \(\mathbf{x}\) and \(\mathbf{y}\), Property 1 is verified.

Property 2 (Scalar Multiplication): Let \(\mathbf{x}=\begin{bmatrix}x_1\\ x_2\end{bmatrix}\) be an arbitrary vector in \(\mathbb{R}^2\) and \(c\in\mathbb{R}\) be an arbitrary scalar.

First, let’s compute \(T(c\mathbf{x})\): \[\begin{align*} T(c\mathbf{x}) &= T\left(\begin{bmatrix}cx_1\\ cx_2\end{bmatrix}\right)\\ &= \begin{bmatrix}cx_1+cx_2\\ cx_2-2(cx_1)\\ cx_1\end{bmatrix}\\ &= \begin{bmatrix}c(x_1+x_2)\\ c(x_2-2x_1)\\ cx_1\end{bmatrix} \end{align*}\]

Now, let’s compute \(cT(\mathbf{x})\): \[\begin{align*} cT(\mathbf{x}) &= c\begin{bmatrix}x_1+x_2\\ x_2-2x_1\\ x_1\end{bmatrix}\\ &= \begin{bmatrix}c(x_1+x_2)\\ c(x_2-2x_1)\\ cx_1\end{bmatrix} \end{align*}\]

Since these are equal for any choice of \(\mathbf{x}\) and \(c\), Property 2 is verified.

Therefore, since both properties hold, \(T\) is indeed a linear transformation.

Now that we know that \(T:\mathbb{R}^2\to\mathbb{R}^3\) is linear we use elementary vector operations, Equation 4.2, and Equation 3.1 to obtain: \[\begin{aligned} T\left( \begin{bmatrix}a\\ b\end{bmatrix} \right) &=T\left( a\begin{bmatrix}1\\ 0\end{bmatrix} + b\begin{bmatrix}0\\ 1\end{bmatrix}\right) \\ &=aT\left( \begin{bmatrix}1\\ 0\end{bmatrix}\right) + bT\left(\begin{bmatrix}0\\ 1\end{bmatrix}\right) \\ &=a\begin{bmatrix}1\\ -1\\ 1\end{bmatrix} + b\begin{bmatrix}1\\ 1\\0\end{bmatrix} = \begin{bmatrix}1 & 1\\ -1 &1\\ 1&0\end{bmatrix}\begin{bmatrix}a\\ b\end{bmatrix}. \end{aligned}\] This means that for every \(\mathbf{x}\in\mathbb{R}^2\), \(T(\mathbf{x})=A\mathbf{x}\) for \(A= \begin{bmatrix}1 & 1\\ -1 &1\\ 1&0\end{bmatrix}\)

Theorem 4.1 Theorem: Let \(T:\mathbb{R}^n\to\mathbb{R}^m\) be a linear transformation. Then there exists an \(m\times n\) matrix \(A\) such that for every \(\mathbf{x}\in\mathbb{R}^n\), \(T(\mathbf{x})=A\mathbf{x}\).

Proof. The proof consists of three main steps:

First, let’s identify the canonical basis vectors of \(\mathbb{R}^n\). Let \(\mathbf{e}_i\) denote the \(i\)-th canonical basis vector: \[\mathbf{e}_1=\begin{bmatrix}1\\0\\ \vdots\\ 0\end{bmatrix},\quad \mathbf{e}_2=\begin{bmatrix}0\\1\\ \vdots\\ 0\end{bmatrix},\quad \ldots, \quad \mathbf{e}_n=\begin{bmatrix}0\\0\\ \vdots\\ 1\end{bmatrix}\]
Any vector \(\mathbf{x}\in\mathbb{R}^n\) can be written uniquely as a linear combination of these basis vectors: \[\mathbf{x}=\begin{bmatrix}x_1\\ x_2\\ \vdots\\ x_n\end{bmatrix}=x_1\mathbf{e}_1 + x_2\mathbf{e}_2 + \cdots + x_n\mathbf{e}_n\]
Now, using the linearity of \(T\) (see Equation 4.2), we have: \[\begin{aligned} T(\mathbf{x})&=T(x_1\mathbf{e}_1 + x_2\mathbf{e}_2 + \cdots + x_n\mathbf{e}_n)\\ &=x_1T(\mathbf{e}_1) + x_2T(\mathbf{e}_2) + \cdots + x_nT(\mathbf{e}_n) \end{aligned} \]
define the matrix \(A\) by using the transformed basis vectors as its columns: \[A=\begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ \downarrow & \downarrow & \cdots & \downarrow \\ \end{bmatrix}\]
Then by the definition of matrix multiplication (see Equation 3.1): \[\begin{aligned} A\mathbf{x}&= \begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ \downarrow & \downarrow & \cdots & \downarrow \\ \end{bmatrix} \begin{bmatrix}x_1\\ x_2\\ \vdots\\ x_n\end{bmatrix} \\ &=x_1T(\mathbf{e}_1) + x_2T(\mathbf{e}_2) + \cdots + x_nT(\mathbf{e}_n)=T(\mathbf{x}) \end{aligned}. \]

Therefore, we have constructed a matrix \(A\) such that \(T(\mathbf{x})=A\mathbf{x}\) for all \(\mathbf{x}\in\mathbb{R}^n\). Note that \(A\) is \(m\times n\) since it has \(n\) columns, and each column \(T(\mathbf{e}_i)\in\mathbb{R}^m\).

Key Algorithm

The definition of the matrix \(A\) provides an algorithm for finding the matrix representation of any linear transformation \(T:\mathbb{R}^n\to\mathbb{R}^m\). Simply compute \(T(\mathbf{e}_i)\) for each canonical basis vector and use these vectors as the columns of \(A\): \[A=\begin{bmatrix} \uparrow & \uparrow & \cdots & \uparrow \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ \downarrow & \downarrow & \cdots & \downarrow \end{bmatrix}\] In other words, the \(j\)-th column of \(A\) is the output of the transformation \(T\) when applied to the \(j\)-th canonical basis vector \(\mathbf{e}_j\).

4.3 Composition of Linear Transformations

Recall that if \(T_1:\mathbb{R}^n\to\mathbb{R}^m\) and \(T_2:\mathbb{R}^m\to \mathbb{R}^p\) are functions, the composition \(T_2\circ T_1:\mathbb{R}^n\to\mathbb{R}^p\) is defined by: \[T_2\circ T_1(\mathbf{x})=T_2(T_1(\mathbf{x}))\quad\text{for any}\quad \mathbf{x}\in\mathbb{R}^n.\]

In this section we will prove that the composition of linear maps is linear and that matrix multiplication corresponds to composition of linear maps.

Thoerem: Suppose that \(T_1:\mathbb{R}^n\to\mathbb{R}^m\) and \(T_2:\mathbb{R}^m\to \mathbb{R}^p\) are linear maps. Then \(T_2\circ T_1:\mathbb{R}^n\to\mathbb{R}^p\) is linear.

Proof. To prove \(T_2\circ T_1\) is linear, we need to show it satisfies two properties:

Additivity: \((T_2\circ T_1)(\mathbf{x} + \mathbf{y}) = (T_2\circ T_1)(\mathbf{x}) + (T_2\circ T_1)(\mathbf{y})\) for all \(\mathbf{x},\mathbf{y} \in \mathbb{R}^n\)
Scalar multiplication: \((T_2\circ T_1)(c\mathbf{x}) = c(T_2\circ T_1)(\mathbf{x})\) for all \(\mathbf{x} \in \mathbb{R}^n\) and \(c \in \mathbb{R}\)

Let’s prove each property:

First, let’s prove additivity. Let \(\mathbf{x},\mathbf{y} \in \mathbb{R}^n\). Then:

\((T_2\circ T_1)(\mathbf{x} + \mathbf{y})\) = \(T_2(T_1(\mathbf{x} + \mathbf{y}))\)

Since \(T_1\) is linear: = \(T_2(T_1(\mathbf{x}) + T_1(\mathbf{y}))\)

Since \(T_2\) is linear: = \(T_2(T_1(\mathbf{x})) + T_2(T_1(\mathbf{y}))\)

By definition of composition: = \((T_2\circ T_1)(\mathbf{x}) + (T_2\circ T_1)(\mathbf{y})\)
Next, let’s prove the scalar multiplication property. Let \(\mathbf{x} \in \mathbb{R}^n\) and \(c \in \mathbb{R}\). Then:

\((T_2\circ T_1)(c\mathbf{x})\) = \(T_2(T_1(c\mathbf{x}))\)

Since \(T_1\) is linear: = \(T_2(cT_1(\mathbf{x}))\)

Since \(T_2\) is linear: = \(cT_2(T_1(\mathbf{x}))\)

By definition of composition: = \(c(T_2\circ T_1)(\mathbf{x})\)

Since both properties hold, we conclude that \(T_2\circ T_1\) is linear. \(\square\)

We now prove that matrix multiplication corresponds to composition of linear maps

Theorem: Let \(T_1:\mathbb{R}^n\to\mathbb{R}^m\) and \(T_2:\mathbb{R}^m\to \mathbb{R}^p\) be linear maps with matrix representations \(B\) and \(A\) respectively. Then \(AB\) is the matrix representation of \(T_2\circ T_1\).

Proof. Let \(\mathbf{x}\in\mathbb{R}^n\). Since \(B\) is the matrix representation of \(T_1:\mathbb{R}^n\to\mathbb{R}^m\), it follows from Theorem 4.1 that \(T_1(\mathbf{x})=B\mathbf{x}\). Similarly, since \(A\) is the matrix representation of \(T_2:\mathbb{R}^m\to \mathbb{R}^p\) and \(B\mathbf{x}\in\mathbb{R}^m\) we have that \(T_2(B\mathbf{x})=A(B{\mathbf{x}})\). Combining these two facts with Lemma 4.1, we get \[(T_2\circ T_1)(\mathbf{x})=T_2(T_1(\mathbf{x}))=T_2(B\mathbf{x})=A(B{\mathbf{x}})=(AB)\mathbf{x}\] which implies that \(AB\) is the matrix representation of \(T_2\circ T_1\). \(\square\)

Since composition of functions is associative, the previous theorem implies that multiplication of matrices is associative

Corollary: Suppose that \(A, B\) and \(C\) are matrices of sizes \(m\times n\), \(n\times p\) and \(p\times q\), respectively. Then \[(AB)C=A(BC)\]

We can also prove associativity using Lemma 4.1 and the definitions of product Equation 3.4, and we can also prove associativity using the formula Equation 3.5 to show that \(((AB)C)_{ij}=(A(BC))_{ij}\).