1 Vectors in \(\mathbb{R}^n\)
The following videos from the Essence of Linear Algebra, from 3Blue1Brown, are exceptionally good. Watch them carefully
1.1 Addition of vectors and scalar product of vectors
Let \(\mathbf{v}=(v_1,\dots,v_n)\) and \(\mathbf{w}=(w_1,\dots,w_n)\) be two vectors in \(\mathbb{R}^n\) and let \(c\in\mathbb{R}\) be a scalar. The sum of \(\mathbf{v}\) and \(\mathbf{w}\) is defined by: \[\mathbf{v}+\mathbf{w}=\begin{bmatrix}v_1\\v_2\\\vdots \\ v_n\end{bmatrix}+\begin{bmatrix}w_1\\w_2\\\vdots\\ w_n\end{bmatrix}=\begin{bmatrix}v_1+w_1\\v_2+w_2\\\vdots\\ v_n+w_n\end{bmatrix}.\]
The scalar product of \(c\) and \(\mathbf{v}\) is defined by: \[c\mathbf{v}=c\begin{bmatrix}v_1\\v_2\\\vdots\\ v_n\end{bmatrix}=\begin{bmatrix}cv_1\\cv_2\\\vdots\\ cv_n\end{bmatrix}.\]
In \(\mathbb{R}^n\), we represent vectors as column vectors. For convenience in inline text, we sometimes write \(\mathbf{v}=(v_1,\dots,v_n)\), but this notation should be understood to represent a column vector. This is distinct from a row vector, which we explicitly denote as \(\mathbf{v}=\begin{bmatrix}v_1& v_2 &\cdots &v_n\end{bmatrix}\). The distinction is important because row and column vectors behave differently under matrix operations.
Properties of addition and scalar product
Let \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\) be vectors in \(\mathbb{R}^n\), and let \(c\) and \(d\) be scalars.
- \(\mathbf{u} + \mathbf{v} \in \mathbb{R}^n\) (Closed under addition)
- \(c\mathbf{u} \in \mathbb{R}^n\) (Closed under scalar multiplication)
- \(\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}\) (Commutative property of addition)
- \((\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})\) (Associative property of addition)
- \(\exists \, \mathbf{0} \in \mathbb{R}^n\) such that \(\mathbf{u} + \mathbf{0} = \mathbf{u}\) (Existence of an additive identity)
- \(\forall \, \mathbf{u} \in \mathbb{R}^n, \, \exists \, -\mathbf{u}\) such that \(\mathbf{u} + (-\mathbf{u}) = \mathbf{0}\) (Existence of additive inverses)
- \(1\mathbf{u} = \mathbf{u}\) (Identity element of scalar multiplication)
- \((cd)\mathbf{u} = c(d\mathbf{u})\) (Associative property of scalar multiplication)
- \(c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}\) (Distributive property)
- \((c + d)\mathbf{u} = c\mathbf{u} + d\mathbf{u}\) (Distributive property)
These properties characterize sets equipped with addition and scalar multiplication that satisfy the axioms of a vector space. In particular, they define the structure not only of \(\mathbb{R}^n\) but also of more abstract vector spaces, where elements need not be geometric vectors, and scalars may belong to fields other than \(\mathbb{R}\), such as \(\mathbb{C}\) or finite fields.
1.2 Linear Combinations
A linear combination of vectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\) in \(\mathbb{R}^n\) is a sum of scalar multiples of these vectors:
\[a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + \cdots + a_k\mathbf{v}_k\]
where \(a_1, a_2, \ldots, a_k\) are scalars (real numbers).
Let’s illustrate this with two vectors in \(\mathbb{R}^2\): \(\mathbf{v}_1=\begin{bmatrix}1\\1\end{bmatrix}\) and \(\mathbf{v}_2=\begin{bmatrix}-1\\1\end{bmatrix}\). We can create different vectors through linear combinations of \(\mathbf{v}_1\) and \(\mathbf{v}_2\):
Combining with positive coefficients: \[2\mathbf{v}_1 + \mathbf{v}_2 = 2\begin{bmatrix}1\\1\end{bmatrix} + \begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}1\\3\end{bmatrix}\]
Using a negative coefficient: \[\mathbf{v}_1 - 3\mathbf{v}_2 = \begin{bmatrix}1\\1\end{bmatrix} - 3\begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}4\\-2\end{bmatrix}\]
Working with fractions: \[\frac{1}{2}\mathbf{v}_1 + \frac{1}{2}\mathbf{v}_2 = \frac{1}{2}\begin{bmatrix}1\\1\end{bmatrix} + \frac{1}{2}\begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix}\]
Many questions in linear algebra reduce to solving systems of linear equations. Questions about linear combinations are a prime example, as we’ll see in the following exercise:
Exercise: Can we write \(\begin{bmatrix}0\\1\\0\end{bmatrix}\) as a linear combination of the vectors \(\begin{bmatrix}1\\2\\3\end{bmatrix}\), \(\begin{bmatrix}4\\5\\6\end{bmatrix}\), \(\begin{bmatrix}7\\8\\9\end{bmatrix}\)?
We want scalars \(c_1\), \(c_2\), \(c_3\) such that: \[c_1\begin{bmatrix}1\\2\\3\end{bmatrix} + c_2\begin{bmatrix}4\\5\\6\end{bmatrix} + c_3\begin{bmatrix}7\\8\\9\end{bmatrix} = \begin{bmatrix}0\\1\\0\end{bmatrix}\] This gives the system of equations: \[\begin{align*} c_1 + 4c_2 + 7c_3 &= 0\\ 2c_1 + 5c_2 + 8c_3 &= 1\\ 3c_1 + 6c_2 + 9c_3 &= 0. \end{align*}\] We can use substitution, or elimination, to show that this system has no solution, so \(\begin{bmatrix}0\\1\\0\end{bmatrix}\) is not a linear combination of the given vectors. We’ll cover solutions to such systems extensively later in the course.
Linear combinations are fundamental in linear algebra and have numerous applications, such as:
- Expressing a vector in terms of other vectors
- Solving systems of linear equations
- Describing lines, planes, and hyperplanes in \(\mathbb{R}^n\)
- Analyzing linear transformations and matrices
1.3 Span
The set of all possible linear combinations of a given set of vectors is known as the span of those vectors, and it has important properties.
Let’s start with a precise definition. If \(\mathbf{v}_1,\dots,\mathbf{v}_k\) are vectors in \(\mathbb{R}^n\), then \[\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right) =\{ c_1\mathbf{v}_1+\cdots+c_k\mathbf{v}_k:c_1,\dots,c_k\in\mathbb{R} \}\]
Notice that \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\) is a subset of \(\mathbb{R}^n\). When working with sets, we typically focus on two key questions:
- How do we verify if an element belongs to the set?
- What properties can we deduce when we know an element belongs to the set?
For the span of vectors \(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\):
Verification: To check if \(\mathbf{v}\) is in the span, we solve a system of equations. Consider:
If \(\mathbf{v}_1 = \begin{bmatrix}1\\2\\1\end{bmatrix}\), \(\mathbf{v}_2 = \begin{bmatrix}0\\1\\1\end{bmatrix}\), and \(\mathbf{v} = \begin{bmatrix}2\\5\\3\end{bmatrix}\)
To check if \(\mathbf{v}\) is in span\((\{\mathbf{v}_1,\mathbf{v}_2\})\), we ask: do there exist \(c_1,c_2\) such that \(c_1\mathbf{v}_1 + c_2\mathbf{v}_2 = \mathbf{v}\)?
This gives us: \[\begin{align*} c_1(1) + c_2(0) &= 2\\ c_1(2) + c_2(1) &= 5\\ c_1(1) + c_2(1) &= 3 \end{align*}\]
If we find values for \(c_1,c_2\) satisfying all equations, then \(\mathbf{v}\) is in the span. If no such values exist, \(\mathbf{v}\) is not in the span.
Properties: If a vector \(\mathbf{w}\) is in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), we know that there exist constants \(c_1,\dots,c_n\in\mathbb{R}\) such that \(\mathbf{w}=c_1\mathbf{v}_1+\cdots+c_k\mathbf{v}_k\).
We use these properties to deduce important results:
Theorem 1.1 Suppose that \(\mathbf{v}_1,\dots,\mathbf{v}_k\) are vectors in \(\mathbb{R}^n\). Then
- If \(\mathbf{w}_1\) and \(\mathbf{w}_2\) are in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), then \(\mathbf{w}_1+\mathbf{w}_2\) is also in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\).
- If \(\mathbf{w}\in \text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), and \(c\in\mathbb{R}\), then \(c\mathbf{w}\in \text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\).
Proof. We only check (1). The proof of (2) is similar. Since \(\mathbf{w}_1\) is in the span, there exist \(a_1,\dots,a_k\) with \(\mathbf{w}_1 = a_1\mathbf{v}_1+\cdots+a_k\mathbf{v}_k\). Similarly, there exist \(b_1,\dots,b_k\) with \(\mathbf{w}_2 = b_1\mathbf{v}_1+\cdots+b_k\mathbf{v}_k\). Then: \[\mathbf{w}_1 + \mathbf{w}_2 = (a_1+b_1)\mathbf{v}_1+\cdots+(a_k+b_k)\mathbf{v}_k\] showing \(\mathbf{w}_1 + \mathbf{w}_2\) is in the span. \(\square\)
Visualizing Vector Spans in \(\mathbb{R}^3\)
- Span of a Single Vector Given \(\mathbf{v} \in \mathbb{R}^3\), the \(\text{span}({\mathbf{v}})\) is:
- A point at origin if \(\mathbf{v} = \mathbf{0}\)
- A line through the origin if \(\mathbf{v} \neq \mathbf{0}\), containing all scalar multiples of \(\mathbf{v}\)
- Span of Two Vectors For nonzero vectors \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^3\), the \(\text{span}({\mathbf{v}, \mathbf{w}})\) is:
- A line through origin if the vectors are parallel (one is a scalar multiple of the other)
- A plane through origin otherwise, containing all linear combinations \(s\mathbf{v} + t\mathbf{w}\) where \(s,t \in \mathbb{R}\)
- Span of Multiple Vectors Consider the set of vectors \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, \mathbf{v}_4\}\): \[ \mathbf{v}_1 = \begin{bmatrix}1\\1\\3\end{bmatrix}, \; \mathbf{v}_2 = \begin{bmatrix}-2\\-2\\-6\end{bmatrix}, \; \mathbf{v}_3 = \begin{bmatrix}1\\-2\\5\end{bmatrix}, \; \mathbf{v}_4 = \begin{bmatrix}0\\3\\-2\end{bmatrix} \]
The span of these vectors is the set of all possible linear combinations: \[\text{span}(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, \mathbf{v}_4\}) = \{t_1\mathbf{v}_1 + t_2\mathbf{v}_2 + t_3\mathbf{v}_3 + t_4\mathbf{v}_4 : t_1,t_2,t_3,t_4 \in \mathbb{R}\}.\]
Their span is visualized by the following graph and we see that all the vectors are in one plane.
The span of a set of vectors in \(\mathbb{R}^3\) must be one of exactly four geometric objects:
- A single point (specifically, the origin \((0,0,0)\))
- A line passing through the origin
- A plane containing the origin
- All of \(\mathbb{R}^3\) (the entire three-dimensional space)
1.4 Dot Product in \(\mathbb{R}^n\)
Let \(\mathbf{v}=(v_1,\dots,v_n)\) and \(\mathbf{w}=(w_1,\dots,w_n)\) be two vectors in \(\mathbb{R}^n\). The dot product of \(\mathbf{v}\) and \(\mathbf{w}\) is defined by: \[\mathbf{v}\cdot\mathbf{w}=\begin{bmatrix}v_1\\v_2\\\vdots\\ v_n\end{bmatrix}\cdot \begin{bmatrix}w_1\\w_2\\\vdots\\ w_n\end{bmatrix}= \sum_{i=1}^nv_iw_i.\]
1.4.1 Properties of the Dot Product in \(\mathbb{R}^n\)
The dot product has three fundamental properties that can be verified directly from its definition. These properties form the foundation for many calculations and proofs in linear algebra.
Let \(\mathbf{u}\), \(\mathbf{v}\) and \(\mathbf{w}\) be vectors in \(\mathbb{R}^n\), and let \(a\in\mathbb{R}\) and \(b\in\mathbb{R}\) be scalars.
- \(\mathbf{v} \cdot \mathbf{v} \geq 0\) and \(\mathbf{v} \cdot \mathbf{v} = 0\) if and only if \(\mathbf{v} = \mathbf{0}\) (Positive Definite)
- \(\mathbf{v} \cdot \mathbf{w} = \mathbf{w} \cdot \mathbf{v}\) (Symmetric)
- \(\mathbf{u}\cdot(a\mathbf{v} + b\mathbf{w}) = a(\mathbf{u} \cdot \mathbf{v}) + b(\mathbf{u} \cdot \mathbf{w})\) and \((a\mathbf{u} + b\mathbf{v}) \cdot \mathbf{w} = a(\mathbf{u} \cdot \mathbf{w}) + b(\mathbf{v} \cdot \mathbf{w})\) (Linear in each Variable)
Exercise: Suppose that \(\mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2\in\mathbb{R}^n\) and that we know the values of \(\mathbf{v}_i\cdot\mathbf{w}_j\). Find \[(2\mathbf{v}_1+3\mathbf{v}_2)\cdot(\mathbf{w}_1-2\mathbf{w}_2)\] in terms of the \(\mathbf{v}_i\cdot\mathbf{w}_j\)’s.
- First, recall the key linearity properties:
- Linear in first variable: \((c_1\mathbf{a}_1 + c_2\mathbf{a}_2)\cdot\mathbf{b} = c_1(\mathbf{a}_1\cdot\mathbf{b}) + c_2(\mathbf{a}_2\cdot\mathbf{b})\)
- Linear in second variable: \(\mathbf{a}\cdot(c_1\mathbf{b}_1 + c_2\mathbf{b}_2) = c_1(\mathbf{a}\cdot\mathbf{b}_1) + c_2(\mathbf{a}\cdot\mathbf{b}_2)\)
- Let’s start with the first variable using linearity:
- \((2\mathbf{v}_1+3\mathbf{v}_2)\cdot(\mathbf{w}_1-2\mathbf{w}_2)\)
- \(= 2\mathbf{v}_1\cdot(\mathbf{w}_1-2\mathbf{w}_2) + 3\mathbf{v}_2\cdot(\mathbf{w}_1-2\mathbf{w}_2)\)
- Now apply linearity in the second variable for each term:
- \(= 2(\mathbf{v}_1\cdot\mathbf{w}_1 - 2\mathbf{v}_1\cdot\mathbf{w}_2) + 3(\mathbf{v}_2\cdot\mathbf{w}_1 - 2\mathbf{v}_2\cdot\mathbf{w}_2)\)
- Expand this:
- \(= 2\mathbf{v}_1\cdot\mathbf{w}_1 - 4\mathbf{v}_1\cdot\mathbf{w}_2 + 3\mathbf{v}_2\cdot\mathbf{w}_1 - 6\mathbf{v}_2\cdot\mathbf{w}_2\)
Now we can use the given values of \(\mathbf{v}_i\cdot\mathbf{w}_j\) to compute the final result by substituting those values into this expression.
The final formula in terms of the known dot products is: \[2(\mathbf{v}_1\cdot\mathbf{w}_1) - 4(\mathbf{v}_1\cdot\mathbf{w}_2) + 3(\mathbf{v}_2\cdot\mathbf{w}_1) - 6(\mathbf{v}_2\cdot\mathbf{w}_2).\]
1.4.2 Norm
The dot product induces a norm on \(\mathbb{R}^n\). The norm of a vector \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\in\mathbb{R}^n\) is given by:
\[\|\mathbf{v}\| = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{\sum_{i=1}^n |v_i|^2}\]
The norm is also known as the magnitude or length of a vector. When we compute the norm or when we check properties, we often look at \(\|\mathbf{v}\|^2=\mathbf{v}\cdot\mathbf{v}\) to avoid the square root.
The norm satisfies the following properties:
- Non-negativity: \(\|\mathbf{v}\| \geq 0\) for all \(\mathbf{v} \in \mathbb{R}^n\)
- Definiteness: \(\|\mathbf{v}\| = 0\) if and only if \(\mathbf{v} = \mathbf{0}\)
- Homogeneity: \(\|c\mathbf{v}\| = |c| \|\mathbf{v}\|\) for all \(c \in \mathbb{R}\) and \(\mathbf{v} \in \mathbb{R}^n\)
- Triangle inequality: \(\|\mathbf{v} + \mathbf{w}\| \leq \|\mathbf{v}\| + \|\mathbf{w}\|\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)
The first three properties are easy to verify. The Triangle Inequality can be proved using Cauchy-Schwarz inequality that says that \(|\mathbf{v} \cdot \mathbf{w}| \leq \|\mathbf{v}\| \|\mathbf{w}\|\)
Exercise: Use Cauchy-Schwarz Inequality to prove the Triangle Inequality of the norm.
We start with the squared norm of \(\mathbf{v} + \mathbf{w}\):
\[\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w})\]
Expanding the right-hand side using the properties of the dot product, we get:
\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + 2(\mathbf{v} \cdot \mathbf{w}) + \mathbf{w} \cdot \mathbf{w} = \|\mathbf{v}\|^2 + 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2\]
Now, we apply the Cauchy-Schwarz Inequality to the term \(2(\mathbf{v} \cdot \mathbf{w})\):
\[2(\mathbf{v}\cdot\mathbf{w})\leq2|\mathbf{v} \cdot \mathbf{w}| \leq 2\|\mathbf{v}\| \|\mathbf{w}\|\]
Substituting this into the previous equation, we obtain:
\[\|\mathbf{v} + \mathbf{w}\|^2 \leq \|\mathbf{v}\|^2 + 2\|\mathbf{v}\| \|\mathbf{w}\| + \|\mathbf{w}\|^2 = (\|\mathbf{v}\| + \|\mathbf{w}\|)^2\]
Taking the square root of both sides (which is valid since both sides are non-negative) yields:
\[\|\mathbf{v} + \mathbf{w}\| \leq \|\mathbf{v}\| + \|\mathbf{w}\|\]
which is the Triangle Inequality for the norm in \(\mathbb{R}^n\). \(\square\)
1.4.3 Orthogonality and Cauchy-Schwarz Inequality
Two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) are orthogonal if \(\mathbf{v}\cdot\mathbf{w}=0\). Orthogonality is a central topic in linear algebra and has numerous applications in various fields, such as:
- Coordinate systems and basis vectors
- Least squares approximation and regression analysis
- Fourier series and signal processing
- Quantum mechanics and Hilbert spaces
Theorem 1.2 (Pythagorean Theorem) If \(\mathbf{v}, \mathbf{w}\in \mathbb{R}^n\) are orthogonal vectors, then \(\|\mathbf{v} + \mathbf{w}\|^2= \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\)
Proof. Let \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) be orthogonal vectors in \(\mathbb{R}^n\). We start with the squared norm of \(\mathbf{v} + \mathbf{w}\):
\[\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w})\]
Expanding the right-hand side using the properties of the dot product, we get:
\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + 2(\mathbf{v} \cdot \mathbf{w}) + \mathbf{w} \cdot \mathbf{w}\]
Since \(\mathbf{v}\) and \(\mathbf{w}\) are orthogonal, \(\mathbf{v} \cdot \mathbf{w} = 0\). Substituting this into the equation above, we obtain:
\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + \mathbf{w} \cdot \mathbf{w} = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\]
which is the Pythagorean Theorem for orthogonal vectors in \(\mathbb{R}^n\). \(\square\)
A similar result is the Parallelogram Law, that says that for any two vectors \(\mathbf{v}, \mathbf{w}\in \mathbb{R}^n\):
\[\|\mathbf{v} + \mathbf{w}\|^2 + \|\mathbf{v} - \mathbf{w}\|^2 = 2(\|\mathbf{v}\|^2 + \|\mathbf{w}\|^2) \tag{1.1}\]
When \(\mathbf{v}\) and \(\mathbf{w}\) are orthogonal, \(\|\mathbf{v} + \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\) and \(\|\mathbf{v} - \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\). Then the Parallelogram Law reduces to the Pythagorean Theorem.
Exercise: Prove the Parallelogram Law
- Expand both squared norms using the dot product definition: \(|\mathbf{u}|^2 = \mathbf{u} \cdot \mathbf{u}\)
- For the left side, you’ll get terms with \(\mathbf{v} \cdot \mathbf{v}\), \(\mathbf{w} \cdot \mathbf{w}\), and \(\mathbf{v} \cdot \mathbf{w}\)
- Pay attention to the signs of the cross terms \(\mathbf{v} \cdot \mathbf{w}\) in both expansions
Theorem 1.3 (Theorem: Cauchy-Schwarz Inequality) Let \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) be two vectors in \(\mathbb{R}^n\). Then \(|\mathbf{v} \cdot \mathbf{w}| \leq \|\mathbf{v}\| \|\mathbf{w}\|\)
Proof. Let \(t \in \mathbb{R}\) be a scalar. Consider the non-negative quantity \(\|\mathbf{v} - t\mathbf{w}\|^2\):
\[\|\mathbf{v} - t\mathbf{w}\|^2 \geq 0\]
Expanding the left-hand side using the properties of the dot product, we get:
\[(\mathbf{v} - t\mathbf{w}) \cdot (\mathbf{v} - t\mathbf{w}) = \mathbf{v} \cdot \mathbf{v} - 2t(\mathbf{v} \cdot \mathbf{w}) + t^2(\mathbf{w} \cdot \mathbf{w}) \geq 0\]
This inequality holds for all values of \(t\). Let’s choose \(t\) to be the value that minimizes the left-hand side:
\[t = \frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{w}\|^2}\]
Substituting this value of \(t\) into the inequality, we obtain:
\[\|\mathbf{v}\|^2 - 2\frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} + \frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} \geq 0\]
Simplifying the left-hand side:
\[\|\mathbf{v}\|^2 - \frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} \geq 0\]
Multiplying both sides by \(\|\mathbf{w}\|^2\) yields:
\[\|\mathbf{v}\|^2\|\mathbf{w}\|^2 \geq (\mathbf{v} \cdot \mathbf{w})^2\]
Taking the square root of both sides (which is valid since both sides are non-negative) gives:
\[\|\mathbf{v}\| \|\mathbf{w}\| \geq |\mathbf{v} \cdot \mathbf{w}|\]
which is the Cauchy-Schwarz Inequality. \(\square\)
Suppose that neither \(\mathbf{v}\) nor \(\mathbf{w}\) are zero and that one is not a multiple of the other.
For any scalar \(t\in\mathbb{R}\), we can write \(\mathbf{w}\) as the sum of two vectors: \(\mathbf{w} = t\mathbf{v}+(\mathbf{w}-t\mathbf{v})\). Our goal is to find \(t\in\mathbb{R}\) such that \(t\mathbf{v}\) and \(\mathbf{w}-t\mathbf{v}\) are orthogonal. For such a \(t\), \[\|\mathbf{w}\|^2 = t^2\|\mathbf{v}\|^2+\|\mathbf{w}-t\mathbf{v}\|^2.\] In particular, \[t^2\|\mathbf{v}\|^2\leq\|\mathbf{w}\|^2.\] From the equation \(\mathbf{v}\cdot (\mathbf{w}-t\mathbf{v})=0\), we find that \(t=\frac{\mathbf{v}\cdot \mathbf{w}}{\mathbf{v}\cdot \mathbf{v}}\). When we substitute this into \[t^2\|\mathbf{v}\|^2\leq\|\mathbf{w}\|^2,\] and simplify we get the Cauchy-Schwarz inequality.
1.4.4 Geometric Interpretation of the Dot Product
The dot product of two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) can be expressed in terms of their norms and the angle between them: \[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\theta)\] where \(\theta\) is the angle between \(\mathbf{v}\) and \(\mathbf{w}\). This relationship highlights the geometric interpretation of the dot product. When \(\theta = 0°\), the vectors are parallel, and the dot product equals the product of their norms. When \(\theta = 90°\), the vectors are orthogonal, and the dot product is zero. The Cauchy-Schwarz Inequality follows directly from this relationship, as \(|\cos(\theta)| \leq 1\).
We can verify this easily in \(\mathbb{R}^2\). Consider two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^2\). Let the angle that vector \(\mathbf{v}\) makes with the positive x-axis be \(\alpha\), and the angle that vector \(\mathbf{w}\) makes with the positive x-axis be \(\alpha + \beta\), where \(\beta\) is the angle between vectors \(\mathbf{v}\) and \(\mathbf{w}\).
The vectors can be expressed in terms of their magnitudes and angles: \(\mathbf{v} = (\|\mathbf{v}\| \cos(\alpha), \|\mathbf{v}\| \sin(\alpha))\) and \(\mathbf{w} = (\|\mathbf{w}\| \cos(\alpha + \beta), \|\mathbf{w}\| \sin(\alpha + \beta))\). The dot product of these vectors is:
\[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| (\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta))\]
Using the angle addition formulas:
\[\cos(\alpha + \beta) = \cos(\alpha) \cos(\beta) - \sin(\alpha) \sin(\beta)\] \[\sin(\alpha + \beta) = \sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta)\]
We show that \(\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) = \cos(\beta)\) and we conclude that \[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\beta).\]
Exercise: Prove that \(\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) = \cos(\beta)\)
\[\begin{aligned} \cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) &= \cos(\alpha) (\cos(\alpha) \cos(\beta) - \sin(\alpha) \sin(\beta)) \\ &\quad + \sin(\alpha) (\sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta))) \\ &= (\cos^2(\alpha) \cos(\beta) - \cos(\alpha) \sin(\alpha) \sin(\beta) \\ &\quad + \sin^2(\alpha) \cos(\beta) + \cos(\alpha) \sin(\alpha) \sin(\beta)) \\ &= (\cos^2(\alpha) \cos(\beta) + \sin^2(\alpha) \cos(\beta)) \\ &= (\cos^2(\alpha) + \sin^2(\alpha)) \cos(\beta) \\ &= \cos(\beta) \end{aligned}\]
1.4.5 Distance
The norm induces a distance (or metric) on \(\mathbb{R}^n\), the distance between two vectors \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) is given by:
\[d(\mathbf{v}, \mathbf{w}) =\|\mathbf{v}-\mathbf{w}\| = \sqrt{\sum_{i=1}^n |v_i - w_i|^2}\]
This distance is known as the Euclidean distance. It satisfies the following properties:
- Non-negativity: \(d(\mathbf{v}, \mathbf{w}) \geq 0\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)
- Definiteness: \(d(\mathbf{v}, \mathbf{w}) = 0\) if and only if \(\mathbf{v} = \mathbf{w}\)
- Symmetry: \(d(\mathbf{v}, \mathbf{w}) = d(\mathbf{w}, \mathbf{v})\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)
- Triangle inequality: \(d(\mathbf{v}, \mathbf{z}) \leq d(\mathbf{v}, \mathbf{w}) + d(\mathbf{w}, \mathbf{z})\) for all \(\mathbf{v}, \mathbf{w}, \mathbf{z} \in \mathbb{R}^n\)
The induced distance has numerous applications in various fields, such as:
- Clustering and classification in machine learning
- Measuring similarity or dissimilarity between objects or data points
- Optimization problems in operations research
- Error analysis and approximation theory in numerical analysis
Understanding the relationships between dot product, norm, and distance is crucial in applications in mathematics, physics, computer science, and engineering.
1.5 Vector Computation in Python
Python provides various ways to work with vectors and mathematical computations. Here’s an overview of the main approaches:
- Built-in Python Lists:
- Basic vector operations require explicit formulas or list comprehensions
- Useful for understanding the underlying computations
- Not optimized for large-scale numerical calculations
- NumPy (Numerical Python):
- Industry-standard library for numerical computing
- Provides efficient array operations and mathematical functions
- Optimized for performance with vectorized operations
- Essential for scientific computing and data analysis
- SymPy (Symbolic Python):
- Computer algebra system for symbolic mathematics
- Handles mathematical expressions with variables and symbols
- Perfect for mathematical proofs and algebraic manipulations
- Useful for verifying theoretical results
In this class, we will primarily use NumPy and Sympy. For solving linear systems, NumPy uses numerically stable methods like QR decomposition rather than Gaussian elimination. While Gaussian elimination is a foundational algorithm taught in linear algebra courses for its theoretical importance and intuitive approach, it can be numerically unstable in practice. SymPy provides a direct implementation of Gaussian elimination, making it useful for understanding the algorithm and verifying theoretical results.
1.5.1 Representing Vectors
In Python, we can represent vectors using lists, in NumPy we use arrays, and in Sympy we use vectors, that are implemented with the function Matrix
. Notice that Sympy shows vectors as column vectors.
import numpy as np
from sympy import Matrix
# Using Python lists
= [1, 2, 3]
v = [4, 5, 6]
w print(v) # Output: [51,2,3]
# Using NumPy arrays
= np.array([1, 2, 3])
v_np = np.array([4, 5, 6])
w_np print(v_np) # Output: [1 2 3]
# Using Sympy vectors
= Matrix([1,2,3])
v_sp = Matrix([4,5,6]) w_sp
[1, 2, 3]
[1 2 3]
# Showing v_sp
v_sp
\(\displaystyle \left[\begin{matrix}1\\2\\3\end{matrix}\right]\)
1.5.2 Vector Addition
The +
is already implemented in Numpy and Sympy.
# Using Python lists
= [v[i] + w[i] for i in range(len(v))]
result print(result) # Output: [5, 7, 9]
# Using NumPy arrays
= v_np + w_np
result_np print(result_np) # Output: [5 7 9]
[5, 7, 9]
[5 7 9]
# Using Sympy vectors
+ w_sp v_sp
\(\displaystyle \left[\begin{matrix}5\\7\\9\end{matrix}\right]\)
1.5.3 Scalar Multiplication
The scalar multiplication is already implemented in Numpy and Sympy.
= 2
scalar
# Using Python lists
= [scalar * x for x in v]
result print(result) # Output: [2, 4, 6]
# Using NumPy arrays
= scalar * v_np
result_np print(result_np) # Output: [2 4 6]
[2, 4, 6]
[2 4 6]
# Using Sympy vectors
* v_sp scalar
\(\displaystyle \left[\begin{matrix}2\\4\\6\end{matrix}\right]\)
1.5.4 Dot Product
The dot product is already implemented in Numpy and Sympy.
# Using Python lists
= sum([v[i] * w[i] for i in range(len(v))])
dot_product print(dot_product) # Output: 32
# Using NumPy arrays
= np.dot(v_np, w_np)
dot_product_np print(dot_product_np) # Output: 32
# Using Sympy vectors
= v_sp.dot(w_sp)
dot_product_sp print(dot_product_sp) # Output: 32
32
32
32
1.5.5 Vector Norms
The norm is already implemented in Numpy and Sympy, but it is inside the linalg
library of Numpy.
import math
# Using Python lists
= math.sqrt(sum([x**2 for x in v]))
norm print(norm) # Output: 3.7416573867739413
# Using NumPy arrays
= np.linalg.norm(v_np)
norm_np print(norm_np) # Output: 3.7416573867739413
# Using Symoy vectors
= v_sp.norm()
norm_sp print(norm_sp) # Output: sqrt(14) which is 3.7416573867739413
3.7416573867739413
3.7416573867739413
sqrt(14)