1 Vectors in \(\mathbb{R}^n\)

The following videos from the Essence of Linear Algebra, from 3Blue1Brown, are exceptionally good. Watch them carefully

1.1 Addition of vectors and scalar product of vectors

Let \(\mathbf{v}=(v_1,\dots,v_n)\) and \(\mathbf{w}=(w_1,\dots,w_n)\) be two vectors in \(\mathbb{R}^n\) and let \(c\in\mathbb{R}\) be a scalar. The sum of \(\mathbf{v}\) and \(\mathbf{w}\) is defined by: \[\mathbf{v}+\mathbf{w}=\begin{bmatrix}v_1\\v_2\\\vdots \\ v_n\end{bmatrix}+\begin{bmatrix}w_1\\w_2\\\vdots\\ w_n\end{bmatrix}=\begin{bmatrix}v_1+w_1\\v_2+w_2\\\vdots\\ v_n+w_n\end{bmatrix}.\]

The scalar product of \(c\) and \(\mathbf{v}\) is defined by: \[c\mathbf{v}=c\begin{bmatrix}v_1\\v_2\\\vdots\\ v_n\end{bmatrix}=\begin{bmatrix}cv_1\\cv_2\\\vdots\\ cv_n\end{bmatrix}.\]

In \(\mathbb{R}^n\), we represent vectors as column vectors. For convenience in inline text, we sometimes write \(\mathbf{v}=(v_1,\dots,v_n)\), but this notation should be understood to represent a column vector. This is distinct from a row vector, which we explicitly denote as \(\mathbf{v}=\begin{bmatrix}v_1& v_2 &\cdots &v_n\end{bmatrix}\). The distinction is important because row and column vectors behave differently under matrix operations.

Properties of addition and scalar product

Let \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\) be vectors in \(\mathbb{R}^n\), and let \(c\) and \(d\) be scalars.

\(\mathbf{u} + \mathbf{v} \in \mathbb{R}^n\) (Closed under addition)
\(c\mathbf{u} \in \mathbb{R}^n\) (Closed under scalar multiplication)
\(\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}\) (Commutative property of addition)
\((\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})\) (Associative property of addition)
\(\exists \, \mathbf{0} \in \mathbb{R}^n\) such that \(\mathbf{u} + \mathbf{0} = \mathbf{u}\) (Existence of an additive identity)
\(\forall \, \mathbf{u} \in \mathbb{R}^n, \, \exists \, -\mathbf{u}\) such that \(\mathbf{u} + (-\mathbf{u}) = \mathbf{0}\) (Existence of additive inverses)
\(1\mathbf{u} = \mathbf{u}\) (Identity element of scalar multiplication)
\((cd)\mathbf{u} = c(d\mathbf{u})\) (Associative property of scalar multiplication)
\(c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}\) (Distributive property)
\((c + d)\mathbf{u} = c\mathbf{u} + d\mathbf{u}\) (Distributive property)

These properties characterize sets equipped with addition and scalar multiplication that satisfy the axioms of a vector space. In particular, they define the structure not only of \(\mathbb{R}^n\) but also of more abstract vector spaces, where elements need not be geometric vectors, and scalars may belong to fields other than \(\mathbb{R}\), such as \(\mathbb{C}\) or finite fields.

1.2 Linear Combinations

A linear combination of vectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\) in \(\mathbb{R}^n\) is a sum of scalar multiples of these vectors:

\[a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + \cdots + a_k\mathbf{v}_k\]

where \(a_1, a_2, \ldots, a_k\) are scalars (real numbers).

Let’s illustrate this with two vectors in \(\mathbb{R}^2\): \(\mathbf{v}_1=\begin{bmatrix}1\\1\end{bmatrix}\) and \(\mathbf{v}_2=\begin{bmatrix}-1\\1\end{bmatrix}\). We can create different vectors through linear combinations of \(\mathbf{v}_1\) and \(\mathbf{v}_2\):

Combining with positive coefficients: \[2\mathbf{v}_1 + \mathbf{v}_2 = 2\begin{bmatrix}1\\1\end{bmatrix} + \begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}1\\3\end{bmatrix}\]
Using a negative coefficient: \[\mathbf{v}_1 - 3\mathbf{v}_2 = \begin{bmatrix}1\\1\end{bmatrix} - 3\begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}4\\-2\end{bmatrix}\]
Working with fractions: \[\frac{1}{2}\mathbf{v}_1 + \frac{1}{2}\mathbf{v}_2 = \frac{1}{2}\begin{bmatrix}1\\1\end{bmatrix} + \frac{1}{2}\begin{bmatrix}-1\\1\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix}\]

Many questions in linear algebra reduce to solving systems of linear equations. Questions about linear combinations are a prime example, as we’ll see in the following exercise:

Exercise: Can we write \(\begin{bmatrix}0\\1\\0\end{bmatrix}\) as a linear combination of the vectors \(\begin{bmatrix}1\\2\\3\end{bmatrix}\), \(\begin{bmatrix}4\\5\\6\end{bmatrix}\), \(\begin{bmatrix}7\\8\\9\end{bmatrix}\)?

Click to see the solution

We want scalars \(c_1\), \(c_2\), \(c_3\) such that: \[c_1\begin{bmatrix}1\\2\\3\end{bmatrix} + c_2\begin{bmatrix}4\\5\\6\end{bmatrix} + c_3\begin{bmatrix}7\\8\\9\end{bmatrix} = \begin{bmatrix}0\\1\\0\end{bmatrix}\] This gives the system of equations: \[\begin{align*} c_1 + 4c_2 + 7c_3 &= 0\\ 2c_1 + 5c_2 + 8c_3 &= 1\\ 3c_1 + 6c_2 + 9c_3 &= 0. \end{align*}\] We can use substitution, or elimination, to show that this system has no solution, so \(\begin{bmatrix}0\\1\\0\end{bmatrix}\) is not a linear combination of the given vectors. We’ll cover solutions to such systems extensively later in the course.

Linear combinations are fundamental in linear algebra and have numerous applications, such as:

Expressing a vector in terms of other vectors
Solving systems of linear equations
Describing lines, planes, and hyperplanes in \(\mathbb{R}^n\)
Analyzing linear transformations and matrices

1.3 Span

The set of all possible linear combinations of a given set of vectors is known as the span of those vectors, and it has important properties.

Let’s start with a precise definition. If \(\mathbf{v}_1,\dots,\mathbf{v}_k\) are vectors in \(\mathbb{R}^n\), then \[\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right) =\{ c_1\mathbf{v}_1+\cdots+c_k\mathbf{v}_k:c_1,\dots,c_k\in\mathbb{R} \}\]

Notice that \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\) is a subset of \(\mathbb{R}^n\). When working with sets, we typically focus on two key questions:

How do we verify if an element belongs to the set?
What properties can we deduce when we know an element belongs to the set?

For the span of vectors \(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\):

Verification: To check if \(\mathbf{v}\) is in the span, we solve a system of equations. Consider:

If \(\mathbf{v}_1 = \begin{bmatrix}1\\2\\1\end{bmatrix}\), \(\mathbf{v}_2 = \begin{bmatrix}0\\1\\1\end{bmatrix}\), and \(\mathbf{v} = \begin{bmatrix}2\\5\\3\end{bmatrix}\)

To check if \(\mathbf{v}\) is in span\((\{\mathbf{v}_1,\mathbf{v}_2\})\), we ask: do there exist \(c_1,c_2\) such that \(c_1\mathbf{v}_1 + c_2\mathbf{v}_2 = \mathbf{v}\)?

This gives us: \[\begin{align*} c_1(1) + c_2(0) &= 2\\ c_1(2) + c_2(1) &= 5\\ c_1(1) + c_2(1) &= 3 \end{align*}\]

If we find values for \(c_1,c_2\) satisfying all equations, then \(\mathbf{v}\) is in the span. If no such values exist, \(\mathbf{v}\) is not in the span.

Properties: If a vector \(\mathbf{w}\) is in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), we know that there exist constants \(c_1,\dots,c_n\in\mathbb{R}\) such that \(\mathbf{w}=c_1\mathbf{v}_1+\cdots+c_k\mathbf{v}_k\).

We use these properties to deduce important results:

Theorem 1.1 Suppose that \(\mathbf{v}_1,\dots,\mathbf{v}_k\) are vectors in \(\mathbb{R}^n\). Then

If \(\mathbf{w}_1\) and \(\mathbf{w}_2\) are in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), then \(\mathbf{w}_1+\mathbf{w}_2\) is also in \(\text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\).
If \(\mathbf{w}\in \text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\), and \(c\in\mathbb{R}\), then \(c\mathbf{w}\in \text{span}\left(\{\mathbf{v}_1,\dots,\mathbf{v}_k\}\right)\).

Proof. We only check (1). The proof of (2) is similar. Since \(\mathbf{w}_1\) is in the span, there exist \(a_1,\dots,a_k\) with \(\mathbf{w}_1 = a_1\mathbf{v}_1+\cdots+a_k\mathbf{v}_k\). Similarly, there exist \(b_1,\dots,b_k\) with \(\mathbf{w}_2 = b_1\mathbf{v}_1+\cdots+b_k\mathbf{v}_k\). Then: \[\mathbf{w}_1 + \mathbf{w}_2 = (a_1+b_1)\mathbf{v}_1+\cdots+(a_k+b_k)\mathbf{v}_k\] showing \(\mathbf{w}_1 + \mathbf{w}_2\) is in the span. \(\square\)

Visualizing Vector Spans in \(\mathbb{R}^3\)

Span of a Single Vector Given \(\mathbf{v} \in \mathbb{R}^3\), the \(\text{span}({\mathbf{v}})\) is:

A point at origin if \(\mathbf{v} = \mathbf{0}\)
A line through the origin if \(\mathbf{v} \neq \mathbf{0}\), containing all scalar multiples of \(\mathbf{v}\)

Span of Two Vectors For nonzero vectors \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^3\), the \(\text{span}({\mathbf{v}, \mathbf{w}})\) is:

A line through origin if the vectors are parallel (one is a scalar multiple of the other)
A plane through origin otherwise, containing all linear combinations \(s\mathbf{v} + t\mathbf{w}\) where \(s,t \in \mathbb{R}\)

Span of Multiple Vectors Consider the set of vectors \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, \mathbf{v}_4\}\): \[ \mathbf{v}_1 = \begin{bmatrix}1\\1\\3\end{bmatrix}, \; \mathbf{v}_2 = \begin{bmatrix}-2\\-2\\-6\end{bmatrix}, \; \mathbf{v}_3 = \begin{bmatrix}1\\-2\\5\end{bmatrix}, \; \mathbf{v}_4 = \begin{bmatrix}0\\3\\-2\end{bmatrix} \]

The span of these vectors is the set of all possible linear combinations: \[\text{span}(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, \mathbf{v}_4\}) = \{t_1\mathbf{v}_1 + t_2\mathbf{v}_2 + t_3\mathbf{v}_3 + t_4\mathbf{v}_4 : t_1,t_2,t_3,t_4 \in \mathbb{R}\}.\]

Their span is visualized by the following graph and we see that all the vectors are in one plane.

The span of a set of vectors in \(\mathbb{R}^3\) must be one of exactly four geometric objects:

A single point (specifically, the origin \((0,0,0)\))
A line passing through the origin
A plane containing the origin
All of \(\mathbb{R}^3\) (the entire three-dimensional space)

1.4 Dot Product in \(\mathbb{R}^n\)

Let \(\mathbf{v}=(v_1,\dots,v_n)\) and \(\mathbf{w}=(w_1,\dots,w_n)\) be two vectors in \(\mathbb{R}^n\). The dot product of \(\mathbf{v}\) and \(\mathbf{w}\) is defined by: \[\mathbf{v}\cdot\mathbf{w}=\begin{bmatrix}v_1\\v_2\\\vdots\\ v_n\end{bmatrix}\cdot \begin{bmatrix}w_1\\w_2\\\vdots\\ w_n\end{bmatrix}= \sum_{i=1}^nv_iw_i.\]

1.4.1 Properties of the Dot Product in \(\mathbb{R}^n\)

The dot product has three fundamental properties that can be verified directly from its definition. These properties form the foundation for many calculations and proofs in linear algebra.

Let \(\mathbf{u}\), \(\mathbf{v}\) and \(\mathbf{w}\) be vectors in \(\mathbb{R}^n\), and let \(a\in\mathbb{R}\) and \(b\in\mathbb{R}\) be scalars.

\(\mathbf{v} \cdot \mathbf{v} \geq 0\) and \(\mathbf{v} \cdot \mathbf{v} = 0\) if and only if \(\mathbf{v} = \mathbf{0}\) (Positive Definite)
\(\mathbf{v} \cdot \mathbf{w} = \mathbf{w} \cdot \mathbf{v}\) (Symmetric)
\(\mathbf{u}\cdot(a\mathbf{v} + b\mathbf{w}) = a(\mathbf{u} \cdot \mathbf{v}) + b(\mathbf{u} \cdot \mathbf{w})\) and \((a\mathbf{u} + b\mathbf{v}) \cdot \mathbf{w} = a(\mathbf{u} \cdot \mathbf{w}) + b(\mathbf{v} \cdot \mathbf{w})\) (Linear in each Variable)

Exercise: Suppose that \(\mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2\in\mathbb{R}^n\) and that we know the values of \(\mathbf{v}_i\cdot\mathbf{w}_j\). Find \[(2\mathbf{v}_1+3\mathbf{v}_2)\cdot(\mathbf{w}_1-2\mathbf{w}_2)\] in terms of the \(\mathbf{v}_i\cdot\mathbf{w}_j\)’s.

Click to see the answer

First, recall the key linearity properties:
- Linear in first variable: \((c_1\mathbf{a}_1 + c_2\mathbf{a}_2)\cdot\mathbf{b} = c_1(\mathbf{a}_1\cdot\mathbf{b}) + c_2(\mathbf{a}_2\cdot\mathbf{b})\)
- Linear in second variable: \(\mathbf{a}\cdot(c_1\mathbf{b}_1 + c_2\mathbf{b}_2) = c_1(\mathbf{a}\cdot\mathbf{b}_1) + c_2(\mathbf{a}\cdot\mathbf{b}_2)\)
Let’s start with the first variable using linearity:
- \((2\mathbf{v}_1+3\mathbf{v}_2)\cdot(\mathbf{w}_1-2\mathbf{w}_2)\)
- \(= 2\mathbf{v}_1\cdot(\mathbf{w}_1-2\mathbf{w}_2) + 3\mathbf{v}_2\cdot(\mathbf{w}_1-2\mathbf{w}_2)\)
Now apply linearity in the second variable for each term:
- \(= 2(\mathbf{v}_1\cdot\mathbf{w}_1 - 2\mathbf{v}_1\cdot\mathbf{w}_2) + 3(\mathbf{v}_2\cdot\mathbf{w}_1 - 2\mathbf{v}_2\cdot\mathbf{w}_2)\)
Expand this:
- \(= 2\mathbf{v}_1\cdot\mathbf{w}_1 - 4\mathbf{v}_1\cdot\mathbf{w}_2 + 3\mathbf{v}_2\cdot\mathbf{w}_1 - 6\mathbf{v}_2\cdot\mathbf{w}_2\)

Now we can use the given values of \(\mathbf{v}_i\cdot\mathbf{w}_j\) to compute the final result by substituting those values into this expression.

The final formula in terms of the known dot products is: \[2(\mathbf{v}_1\cdot\mathbf{w}_1) - 4(\mathbf{v}_1\cdot\mathbf{w}_2) + 3(\mathbf{v}_2\cdot\mathbf{w}_1) - 6(\mathbf{v}_2\cdot\mathbf{w}_2).\]

1.4.2 Norm

The dot product induces a norm on \(\mathbb{R}^n\). The norm of a vector \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\in\mathbb{R}^n\) is given by:

\[\|\mathbf{v}\| = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{\sum_{i=1}^n |v_i|^2}\]

The norm is also known as the magnitude or length of a vector. When we compute the norm or when we check properties, we often look at \(\|\mathbf{v}\|^2=\mathbf{v}\cdot\mathbf{v}\) to avoid the square root.

The norm satisfies the following properties:

Non-negativity: \(\|\mathbf{v}\| \geq 0\) for all \(\mathbf{v} \in \mathbb{R}^n\)
Definiteness: \(\|\mathbf{v}\| = 0\) if and only if \(\mathbf{v} = \mathbf{0}\)
Homogeneity: \(\|c\mathbf{v}\| = |c| \|\mathbf{v}\|\) for all \(c \in \mathbb{R}\) and \(\mathbf{v} \in \mathbb{R}^n\)
Triangle inequality: \(\|\mathbf{v} + \mathbf{w}\| \leq \|\mathbf{v}\| + \|\mathbf{w}\|\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)

The first three properties are easy to verify. The Triangle Inequality can be proved using Cauchy-Schwarz inequality that says that \(|\mathbf{v} \cdot \mathbf{w}| \leq \|\mathbf{v}\| \|\mathbf{w}\|\)

Exercise: Use Cauchy-Schwarz Inequality to prove the Triangle Inequality of the norm.

Click to see the proof

We start with the squared norm of \(\mathbf{v} + \mathbf{w}\):

\[\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w})\]

Expanding the right-hand side using the properties of the dot product, we get:

\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + 2(\mathbf{v} \cdot \mathbf{w}) + \mathbf{w} \cdot \mathbf{w} = \|\mathbf{v}\|^2 + 2(\mathbf{v} \cdot \mathbf{w}) + \|\mathbf{w}\|^2\]

Now, we apply the Cauchy-Schwarz Inequality to the term \(2(\mathbf{v} \cdot \mathbf{w})\):

\[2(\mathbf{v}\cdot\mathbf{w})\leq2|\mathbf{v} \cdot \mathbf{w}| \leq 2\|\mathbf{v}\| \|\mathbf{w}\|\]

Substituting this into the previous equation, we obtain:

\[\|\mathbf{v} + \mathbf{w}\|^2 \leq \|\mathbf{v}\|^2 + 2\|\mathbf{v}\| \|\mathbf{w}\| + \|\mathbf{w}\|^2 = (\|\mathbf{v}\| + \|\mathbf{w}\|)^2\]

Taking the square root of both sides (which is valid since both sides are non-negative) yields:

\[\|\mathbf{v} + \mathbf{w}\| \leq \|\mathbf{v}\| + \|\mathbf{w}\|\]

which is the Triangle Inequality for the norm in \(\mathbb{R}^n\). \(\square\)

1.4.3 Orthogonality and Cauchy-Schwarz Inequality

Two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) are orthogonal if \(\mathbf{v}\cdot\mathbf{w}=0\). Orthogonality is a central topic in linear algebra and has numerous applications in various fields, such as:

Coordinate systems and basis vectors
Least squares approximation and regression analysis
Fourier series and signal processing
Quantum mechanics and Hilbert spaces

Theorem 1.2 (Pythagorean Theorem) If \(\mathbf{v}, \mathbf{w}\in \mathbb{R}^n\) are orthogonal vectors, then \(\|\mathbf{v} + \mathbf{w}\|^2= \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\)

Proof. Let \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) be orthogonal vectors in \(\mathbb{R}^n\). We start with the squared norm of \(\mathbf{v} + \mathbf{w}\):

\[\|\mathbf{v} + \mathbf{w}\|^2 = (\mathbf{v} + \mathbf{w}) \cdot (\mathbf{v} + \mathbf{w})\]

Expanding the right-hand side using the properties of the dot product, we get:

\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + 2(\mathbf{v} \cdot \mathbf{w}) + \mathbf{w} \cdot \mathbf{w}\]

Since \(\mathbf{v}\) and \(\mathbf{w}\) are orthogonal, \(\mathbf{v} \cdot \mathbf{w} = 0\). Substituting this into the equation above, we obtain:

\[\|\mathbf{v} + \mathbf{w}\|^2 = \mathbf{v} \cdot \mathbf{v} + \mathbf{w} \cdot \mathbf{w} = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\]

which is the Pythagorean Theorem for orthogonal vectors in \(\mathbb{R}^n\). \(\square\)

A similar result is the Parallelogram Law, that says that for any two vectors \(\mathbf{v}, \mathbf{w}\in \mathbb{R}^n\):

\[\|\mathbf{v} + \mathbf{w}\|^2 + \|\mathbf{v} - \mathbf{w}\|^2 = 2(\|\mathbf{v}\|^2 + \|\mathbf{w}\|^2) \tag{1.1}\]

When \(\mathbf{v}\) and \(\mathbf{w}\) are orthogonal, \(\|\mathbf{v} + \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\) and \(\|\mathbf{v} - \mathbf{w}\|^2 = \|\mathbf{v}\|^2 + \|\mathbf{w}\|^2\). Then the Parallelogram Law reduces to the Pythagorean Theorem.

Exercise: Prove the Parallelogram Law

Hint

Expand both squared norms using the dot product definition: \(|\mathbf{u}|^2 = \mathbf{u} \cdot \mathbf{u}\)
For the left side, you’ll get terms with \(\mathbf{v} \cdot \mathbf{v}\), \(\mathbf{w} \cdot \mathbf{w}\), and \(\mathbf{v} \cdot \mathbf{w}\)
Pay attention to the signs of the cross terms \(\mathbf{v} \cdot \mathbf{w}\) in both expansions

Theorem 1.3 (Theorem: Cauchy-Schwarz Inequality) Let \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) be two vectors in \(\mathbb{R}^n\). Then \(|\mathbf{v} \cdot \mathbf{w}| \leq \|\mathbf{v}\| \|\mathbf{w}\|\)

Proof. Let \(t \in \mathbb{R}\) be a scalar. Consider the non-negative quantity \(\|\mathbf{v} - t\mathbf{w}\|^2\):

\[\|\mathbf{v} - t\mathbf{w}\|^2 \geq 0\]

Expanding the left-hand side using the properties of the dot product, we get:

\[(\mathbf{v} - t\mathbf{w}) \cdot (\mathbf{v} - t\mathbf{w}) = \mathbf{v} \cdot \mathbf{v} - 2t(\mathbf{v} \cdot \mathbf{w}) + t^2(\mathbf{w} \cdot \mathbf{w}) \geq 0\]

This inequality holds for all values of \(t\). Let’s choose \(t\) to be the value that minimizes the left-hand side:

\[t = \frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{w}\|^2}\]

Substituting this value of \(t\) into the inequality, we obtain:

\[\|\mathbf{v}\|^2 - 2\frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} + \frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} \geq 0\]

Simplifying the left-hand side:

\[\|\mathbf{v}\|^2 - \frac{(\mathbf{v} \cdot \mathbf{w})^2}{\|\mathbf{w}\|^2} \geq 0\]

Multiplying both sides by \(\|\mathbf{w}\|^2\) yields:

\[\|\mathbf{v}\|^2\|\mathbf{w}\|^2 \geq (\mathbf{v} \cdot \mathbf{w})^2\]

Taking the square root of both sides (which is valid since both sides are non-negative) gives:

\[\|\mathbf{v}\| \|\mathbf{w}\| \geq |\mathbf{v} \cdot \mathbf{w}|\]

which is the Cauchy-Schwarz Inequality. \(\square\)

Click to see an alternative proof of Cauchy-Schawrz

Suppose that neither \(\mathbf{v}\) nor \(\mathbf{w}\) are zero and that one is not a multiple of the other.

For any scalar \(t\in\mathbb{R}\), we can write \(\mathbf{w}\) as the sum of two vectors: \(\mathbf{w} = t\mathbf{v}+(\mathbf{w}-t\mathbf{v})\). Our goal is to find \(t\in\mathbb{R}\) such that \(t\mathbf{v}\) and \(\mathbf{w}-t\mathbf{v}\) are orthogonal. For such a \(t\), \[\|\mathbf{w}\|^2 = t^2\|\mathbf{v}\|^2+\|\mathbf{w}-t\mathbf{v}\|^2.\] In particular, \[t^2\|\mathbf{v}\|^2\leq\|\mathbf{w}\|^2.\] From the equation \(\mathbf{v}\cdot (\mathbf{w}-t\mathbf{v})=0\), we find that \(t=\frac{\mathbf{v}\cdot \mathbf{w}}{\mathbf{v}\cdot \mathbf{v}}\). When we substitute this into \[t^2\|\mathbf{v}\|^2\leq\|\mathbf{w}\|^2,\] and simplify we get the Cauchy-Schwarz inequality.

1.4.4 Geometric Interpretation of the Dot Product

The dot product of two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) can be expressed in terms of their norms and the angle between them: \[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\theta)\] where \(\theta\) is the angle between \(\mathbf{v}\) and \(\mathbf{w}\). This relationship highlights the geometric interpretation of the dot product. When \(\theta = 0°\), the vectors are parallel, and the dot product equals the product of their norms. When \(\theta = 90°\), the vectors are orthogonal, and the dot product is zero. The Cauchy-Schwarz Inequality follows directly from this relationship, as \(|\cos(\theta)| \leq 1\).

We can verify this easily in \(\mathbb{R}^2\). Consider two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^2\). Let the angle that vector \(\mathbf{v}\) makes with the positive x-axis be \(\alpha\), and the angle that vector \(\mathbf{w}\) makes with the positive x-axis be \(\alpha + \beta\), where \(\beta\) is the angle between vectors \(\mathbf{v}\) and \(\mathbf{w}\).

The vectors can be expressed in terms of their magnitudes and angles: \(\mathbf{v} = (\|\mathbf{v}\| \cos(\alpha), \|\mathbf{v}\| \sin(\alpha))\) and \(\mathbf{w} = (\|\mathbf{w}\| \cos(\alpha + \beta), \|\mathbf{w}\| \sin(\alpha + \beta))\). The dot product of these vectors is:

\[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| (\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta))\]

Using the angle addition formulas:

\[\cos(\alpha + \beta) = \cos(\alpha) \cos(\beta) - \sin(\alpha) \sin(\beta)\] \[\sin(\alpha + \beta) = \sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta)\]

We show that \(\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) = \cos(\beta)\) and we conclude that \[\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\beta).\]

Exercise: Prove that \(\cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) = \cos(\beta)\)

Click to see the proof

\[\begin{aligned} \cos(\alpha) \cos(\alpha + \beta) + \sin(\alpha) \sin(\alpha + \beta) &= \cos(\alpha) (\cos(\alpha) \cos(\beta) - \sin(\alpha) \sin(\beta)) \\ &\quad + \sin(\alpha) (\sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta))) \\ &= (\cos^2(\alpha) \cos(\beta) - \cos(\alpha) \sin(\alpha) \sin(\beta) \\ &\quad + \sin^2(\alpha) \cos(\beta) + \cos(\alpha) \sin(\alpha) \sin(\beta)) \\ &= (\cos^2(\alpha) \cos(\beta) + \sin^2(\alpha) \cos(\beta)) \\ &= (\cos^2(\alpha) + \sin^2(\alpha)) \cos(\beta) \\ &= \cos(\beta) \end{aligned}\]

1.4.5 Distance

The norm induces a distance (or metric) on \(\mathbb{R}^n\), the distance between two vectors \(\mathbf{v} = (v_1, v_2, \ldots, v_n)\) and \(\mathbf{w} = (w_1, w_2, \ldots, w_n)\) is given by:

\[d(\mathbf{v}, \mathbf{w}) =\|\mathbf{v}-\mathbf{w}\| = \sqrt{\sum_{i=1}^n |v_i - w_i|^2}\]

This distance is known as the Euclidean distance. It satisfies the following properties:

Non-negativity: \(d(\mathbf{v}, \mathbf{w}) \geq 0\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)
Definiteness: \(d(\mathbf{v}, \mathbf{w}) = 0\) if and only if \(\mathbf{v} = \mathbf{w}\)
Symmetry: \(d(\mathbf{v}, \mathbf{w}) = d(\mathbf{w}, \mathbf{v})\) for all \(\mathbf{v}, \mathbf{w} \in \mathbb{R}^n\)
Triangle inequality: \(d(\mathbf{v}, \mathbf{z}) \leq d(\mathbf{v}, \mathbf{w}) + d(\mathbf{w}, \mathbf{z})\) for all \(\mathbf{v}, \mathbf{w}, \mathbf{z} \in \mathbb{R}^n\)

The induced distance has numerous applications in various fields, such as:

Clustering and classification in machine learning
Measuring similarity or dissimilarity between objects or data points
Optimization problems in operations research
Error analysis and approximation theory in numerical analysis

Understanding the relationships between dot product, norm, and distance is crucial in applications in mathematics, physics, computer science, and engineering.

1.5 Vector Computation in Python

Python provides various ways to work with vectors and mathematical computations. Here’s an overview of the main approaches:

Built-in Python Lists:
- Basic vector operations require explicit formulas or list comprehensions
- Useful for understanding the underlying computations
- Not optimized for large-scale numerical calculations
NumPy (Numerical Python):
- Industry-standard library for numerical computing
- Provides efficient array operations and mathematical functions
- Optimized for performance with vectorized operations
- Essential for scientific computing and data analysis
SymPy (Symbolic Python):
- Computer algebra system for symbolic mathematics
- Handles mathematical expressions with variables and symbols
- Perfect for mathematical proofs and algebraic manipulations
- Useful for verifying theoretical results

In this class, we will primarily use NumPy and Sympy. For solving linear systems, NumPy uses numerically stable methods like QR decomposition rather than Gaussian elimination. While Gaussian elimination is a foundational algorithm taught in linear algebra courses for its theoretical importance and intuitive approach, it can be numerically unstable in practice. SymPy provides a direct implementation of Gaussian elimination, making it useful for understanding the algorithm and verifying theoretical results.

1.5.1 Representing Vectors

In Python, we can represent vectors using lists, in NumPy we use arrays, and in Sympy we use vectors, that are implemented with the function Matrix. Notice that Sympy shows vectors as column vectors.

import numpy as np
from sympy import Matrix

# Using Python lists
v = [1, 2, 3]
w = [4, 5, 6]
print(v)  # Output: [51,2,3]

# Using NumPy arrays
v_np = np.array([1, 2, 3])
w_np = np.array([4, 5, 6])
print(v_np)  # Output: [1 2 3]

# Using Sympy vectors
v_sp = Matrix([1,2,3])
w_sp = Matrix([4,5,6])

[1, 2, 3]
[1 2 3]

# Showing v_sp
v_sp

\(\displaystyle \left[\begin{matrix}1\\2\\3\end{matrix}\right]\)

1.5.2 Vector Addition

The + is already implemented in Numpy and Sympy.

# Using Python lists
result = [v[i] + w[i] for i in range(len(v))]
print(result)  # Output: [5, 7, 9]

# Using NumPy arrays
result_np = v_np + w_np
print(result_np)  # Output: [5 7 9]

[5, 7, 9]
[5 7 9]

# Using Sympy vectors
v_sp + w_sp

\(\displaystyle \left[\begin{matrix}5\\7\\9\end{matrix}\right]\)

1.5.3 Scalar Multiplication

The scalar multiplication is already implemented in Numpy and Sympy.

scalar = 2

# Using Python lists
result = [scalar * x for x in v]
print(result)  # Output: [2, 4, 6]

# Using NumPy arrays
result_np = scalar * v_np
print(result_np)  # Output: [2 4 6]

[2, 4, 6]
[2 4 6]

# Using Sympy vectors
scalar * v_sp

\(\displaystyle \left[\begin{matrix}2\\4\\6\end{matrix}\right]\)

1.5.4 Dot Product

The dot product is already implemented in Numpy and Sympy.

# Using Python lists
dot_product = sum([v[i] * w[i] for i in range(len(v))])
print(dot_product)  # Output: 32

# Using NumPy arrays
dot_product_np = np.dot(v_np, w_np)
print(dot_product_np)  # Output: 32

# Using Sympy vectors
dot_product_sp = v_sp.dot(w_sp)
print(dot_product_sp)  # Output: 32

32
32
32

1.5.5 Vector Norms

The norm is already implemented in Numpy and Sympy, but it is inside the linalg library of Numpy.

import math

# Using Python lists
norm = math.sqrt(sum([x**2 for x in v]))
print(norm)  # Output: 3.7416573867739413

# Using NumPy arrays
norm_np = np.linalg.norm(v_np)
print(norm_np)  # Output: 3.7416573867739413

# Using Symoy vectors
norm_sp = v_sp.norm()
print(norm_sp)  # Output: sqrt(14) which is 3.7416573867739413

3.7416573867739413
3.7416573867739413
sqrt(14)