The Little Book of Linear Algebra
A concise, beginner-friendly introduction to the core ideas of linear algebra.
Formats
Chapter 1. Vectors
1.1 Scalars and Vectors
A scalar is a single numerical quantity, most often taken from the real numbers, denoted by $\mathbb{R}$ . Scalars are the fundamental building blocks of arithmetic: they can be added, subtracted, multiplied, and, except in the case of zero, divided. In linear algebra, scalars play the role of coefficients, scaling factors, and entries of larger structures such as vectors and matrices. They provide the weights by which more complex objects are measured and combined. A vector is an ordered collection of scalars, arranged either in a row or a column. When the scalars are real numbers, the vector is said to belong to real $n$ -dimensional space, written
$$ \mathbb{R}^n = { (x_1, x_2, \dots, x_n) \mid x_i \in \mathbb{R} }. $$
An element of $\mathbb{R}^n$ is called a vector of dimension $n$ or an n-vector. The number $n$ is called the dimension of the vector space. Thus $\mathbb{R}^2$ is the space of all ordered pairs of real numbers, $\mathbb{R}^3$ the space of all ordered triples, and so on.
Example 1.1.1.
A 2-dimensional vector: $(3, -1) \in \mathbb{R}^2$ .
. A 3-dimensional vector: $(2, 0, 5) \in \mathbb{R}^3$ .
. A 1-dimensional vector: $(7) \in \mathbb{R}^1$ , which corresponds to the scalar $7$ itself.
Vectors are often written vertically in column form, which emphasizes their role in matrix multiplication:
$$ \mathbf{v} = \begin{bmatrix} 2 \ 0 \ 5 \end{bmatrix} \in \mathbb{R}^3. $$
The vertical layout makes the structure clearer when we consider linear combinations or multiply matrices by vectors.
Geometric Interpretation
In $\mathbb{R}^2$ , a vector $(x_1, x_2)$ can be visualized as an arrow starting at the origin $(0,0)$ and ending at the point $(x_1, x_2)$ . Its length corresponds to the distance from the origin, and its orientation gives a direction in the plane. In $\mathbb{R}^3$ , the same picture extends into three dimensions: a vector is an arrow from the origin to $(x_1, x_2, x_3)$ . Beyond three dimensions, direct visualization is no longer possible, but the algebraic rules of vectors remain identical. Even though we cannot draw a vector in $\mathbb{R}^{10}$ , it behaves under addition, scaling, and transformation exactly as a 2- or 3-dimensional vector does. This abstract point of view is what allows linear algebra to apply to data science, physics, and machine learning, where data often lives in very high-dimensional spaces. Thus a vector may be regarded in three complementary ways:
As a point in space, described by its coordinates. As a displacement or arrow, described by a direction and a length. As an abstract element of a vector space, whose properties follow algebraic rules independent of geometry.
Notation
Vectors are written in boldface lowercase letters: $\mathbf{v}, \mathbf{w}, \mathbf{x}$ .
. The i-th entry of a vector $\mathbf{v}$ is written $v_i$ , where indices begin at 1.
is written , where indices begin at 1. The set of all n-dimensional vectors over $\mathbb{R}$ is denoted $\mathbb{R}^n$ .
is denoted . Column vectors will be the default form unless otherwise stated.
Why begin here?
Scalars and vectors form the atoms of linear algebra. Every structure we will build-vector spaces, linear transformations, matrices, eigenvalues-relies on the basic notions of number and ordered collection of numbers. Once vectors are understood, we can define operations such as addition and scalar multiplication, then generalize to subspaces, bases, and coordinate systems. Eventually, this framework grows into the full theory of linear algebra, with powerful applications to geometry, computation, and data.
Exercises 1.1
Write three different vectors in $\mathbb{R}^2$ and sketch them as arrows from the origin. Identify their coordinates explicitly. Give an example of a vector in $\mathbb{R}^4$ . Can you visualize it directly? Explain why high-dimensional visualization is challenging. Let $\mathbf{v} = (4, -3, 2)$ . Write $\mathbf{v}$ in column form and state $v_1, v_2, v_3$ . In what sense is the set $\mathbb{R}^1$ both a line and a vector space? Illustrate with examples. Consider the vector $\mathbf{u} = (1,1,\dots,1) \in \mathbb{R}^n$ . What is special about this vector when $n$ is large? What might it represent in applications?
1.2 Vector Addition and Scalar Multiplication
Vectors in linear algebra are not static objects; their power comes from the operations we can perform on them. Two fundamental operations define the structure of vector spaces: addition and scalar multiplication. These operations satisfy simple but far-reaching rules that underpin the entire subject.
Vector Addition
Given two vectors of the same dimension, their sum is obtained by adding corresponding entries. Formally, if
$$ \mathbf{u} = (u_1, u_2, \dots, u_n), \quad \mathbf{v} = (v_1, v_2, \dots, v_n), $$
then their sum is
$$ \mathbf{u} + \mathbf{v} = (u_1+v_1, u_2+v_2, \dots, u_n+v_n). $$
Example 1.2.1. Let $\mathbf{u} = (2, -1, 3)$ and $\mathbf{v} = (4, 0, -5)$ . Then
$$ \mathbf{u} + \mathbf{v} = (2+4, -1+0, 3+(-5)) = (6, -1, -2). $$
Geometrically, vector addition corresponds to the parallelogram rule. If we draw both vectors as arrows from the origin, then placing the tail of one vector at the head of the other produces the sum. The diagonal of the parallelogram they form represents the resulting vector.
Scalar Multiplication
Multiplying a vector by a scalar stretches or shrinks the vector while preserving its direction, unless the scalar is negative, in which case the vector is also reversed. If $c \in \mathbb{R}$ and
$$ \mathbf{v} = (v_1, v_2, \dots, v_n), $$
then
$$ c \mathbf{v} = (c v_1, c v_2, \dots, c v_n). $$
Example 1.2.2. Let $\mathbf{v} = (3, -2)$ and $c = -2$ . Then
$$ c\mathbf{v} = -2(3, -2) = (-6, 4). $$
This corresponds to flipping the vector through the origin and doubling its length.
Linear Combinations
The interaction of addition and scalar multiplication allows us to form linear combinations. A linear combination of vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k$ is any vector of the form
$$ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k, \quad c_i \in \mathbb{R}. $$
Linear combinations are the mechanism by which we generate new vectors from existing ones. The span of a set of vectors-the collection of all their linear combinations-will later lead us to the idea of a subspace.
Example 1.2.3. Let $\mathbf{v}_1 = (1,0)$ and $\mathbf{v}_2 = (0,1)$ . Then any vector $(a,b)\in\mathbb{R}^2$ can be expressed as
$$ a\mathbf{v}_1 + b\mathbf{v}_2. $$
Thus $(1,0)$ and $(0,1)$ form the basic building blocks of the plane.
Notation
Addition: $\mathbf{u} + \mathbf{v}$ means component-wise addition.
means component-wise addition. Scalar multiplication: $c\mathbf{v}$ scales each entry of $\mathbf{v}$ by $c$ .
scales each entry of by . Linear combination: a sum of the form $c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k$ .
Why this matters
Vector addition and scalar multiplication are the defining operations of linear algebra. They give structure to vector spaces, allow us to describe geometric phenomena like translation and scaling, and provide the foundation for solving systems of equations. Everything that follows-basis, dimension, transformations-builds on these simple but profound rules.
Exercises 1.2
Compute $\mathbf{u} + \mathbf{v}$ where $\mathbf{u} = (1,2,3)$ and $\mathbf{v} = (4, -1, 0)$ . Find $3\mathbf{v}$ where $\mathbf{v} = (-2,5)$ . Sketch both vectors to illustrate the scaling. Show that $(5,7)$ can be written as a linear combination of $(1,0)$ and $(0,1)$ . Write $(4,4)$ as a linear combination of $(1,1)$ and $(1,-1)$ . Prove that if $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ , then $(c+d)(\mathbf{u}+\mathbf{v}) = c\mathbf{u} + c\mathbf{v} + d\mathbf{u} + d\mathbf{v}$ for scalars $c,d \in \mathbb{R}$ .
1.3 Dot Product, Norms, and Angles
The dot product is the fundamental operation that links algebra and geometry in vector spaces. It allows us to measure lengths, compute angles, and determine orthogonality. From this single definition flow the notions of norm and angle, which give geometry to abstract vector spaces.
The Dot Product
For two vectors in $\mathbb{R}^n$ , the dot product (also called the inner product) is defined by
$$ \mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n. $$
Equivalently, in matrix notation:
$$ \mathbf{u} \cdot \mathbf{v} = \mathbf{u}^T \mathbf{v}. $$
Example 1.3.1. Let $\mathbf{u} = (2, -1, 3)$ and $\mathbf{v} = (4, 0, -2)$ . Then
$$ \mathbf{u} \cdot \mathbf{v} = 2\cdot 4 + (-1)\cdot 0 + 3\cdot (-2) = 8 - 6 = 2. $$
The dot product outputs a single scalar, not another vector.
Norms (Length of a Vector)
The Euclidean norm of a vector is the square root of its dot product with itself:
$$ |\mathbf{v}| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}. $$
This generalizes the Pythagorean theorem to arbitrary dimensions.
Example 1.3.2. For $\mathbf{v} = (3, 4)$ ,
$$ |\mathbf{v}| = \sqrt{3^2 + 4^2} = \sqrt{25} = 5. $$
This is exactly the length of the vector as an arrow in the plane.
Angles Between Vectors
The dot product also encodes the angle between two vectors. For nonzero vectors $\mathbf{u}, \mathbf{v}$ ,
$$ \mathbf{u} \cdot \mathbf{v} = |\mathbf{u}| , |\mathbf{v}| \cos \theta, $$
where $\theta$ is the angle between them. Thus,
$$ \cos \theta = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{u}||\mathbf{v}|}. $$
Example 1.3.3. Let $\mathbf{u} = (1,0)$ and $\mathbf{v} = (0,1)$ . Then
$$ \mathbf{u} \cdot \mathbf{v} = 0, \quad |\mathbf{u}| = 1, \quad |\mathbf{v}| = 1. $$
Hence
$$ \cos \theta = \frac{0}{1\cdot 1} = 0 \quad \Rightarrow \quad \theta = \frac{\pi}{2}. $$
The vectors are perpendicular.
Orthogonality
Two vectors are said to be orthogonal if their dot product is zero:
$$ \mathbf{u} \cdot \mathbf{v} = 0. $$
Orthogonality generalizes the idea of perpendicularity from geometry to higher dimensions.
Notation
Dot product: $\mathbf{u} \cdot \mathbf{v}$ .
. Norm (length): $|\mathbf{v}|$ .
. Orthogonality: $\mathbf{u} \perp \mathbf{v}$ if $\mathbf{u} \cdot \mathbf{v} = 0$ .
Why this matters
The dot product turns vector spaces into geometric objects: vectors gain lengths, angles, and notions of perpendicularity. This foundation will later support the study of orthogonal projections, Gram–Schmidt orthogonalization, eigenvectors, and least squares problems.
Exercises 1.3
Compute $\mathbf{u} \cdot \mathbf{v}$ for $\mathbf{u} = (1,2,3)$ , $\mathbf{v} = (4,5,6)$ . Find the norm of $\mathbf{v} = (2, -2, 1)$ . Determine whether $\mathbf{u} = (1,1,0)$ and $\mathbf{v} = (1,-1,2)$ are orthogonal. Let $\mathbf{u} = (3,4)$ , $\mathbf{v} = (4,3)$ . Compute the angle between them. Prove that $|\mathbf{u} + \mathbf{v}|^2 = |\mathbf{u}|^2 + |\mathbf{v}|^2 + 2\mathbf{u}\cdot \mathbf{v}$ . This identity is the algebraic version of the Law of Cosines.
1.4 Orthogonality
Orthogonality captures the notion of perpendicularity in vector spaces. It is one of the most important geometric ideas in linear algebra, allowing us to decompose vectors, define projections, and construct special bases with elegant properties.
Definition
Two vectors $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ are said to be orthogonal if their dot product is zero:
$$ \mathbf{u} \cdot \mathbf{v} = 0. $$
This condition ensures that the angle between them is $\pi/2$ radians (90 degrees).
Example 1.4.1. In $\mathbb{R}^2$ , the vectors $(1,2)$ and $(2,-1)$ are orthogonal since
$$ (1,2) \cdot (2,-1) = 1\cdot 2 + 2\cdot (-1) = 0. $$
Orthogonal Sets
A collection of vectors is called orthogonal if every distinct pair of vectors in the set is orthogonal. If, in addition, each vector has norm 1, the set is called orthonormal.
Example 1.4.2. In $\mathbb{R}^3$ , the standard basis vectors
$$ \mathbf{e}_1 = (1,0,0), \quad \mathbf{e}_2 = (0,1,0), \quad \mathbf{e}_3 = (0,0,1) $$
form an orthonormal set: each has length 1, and their dot products vanish when the indices differ.
Projections
Orthogonality makes possible the decomposition of a vector into two components: one parallel to another vector, and one orthogonal to it. Given a nonzero vector $\mathbf{u}$ and any vector $\mathbf{v}$ , the projection of $\mathbf{v}$ onto $\mathbf{u}$ is
$$ \text{proj}_{\mathbf{u}}(\mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\mathbf{u} \cdot \mathbf{u}} \mathbf{u}. $$
The difference
$$ \mathbf{v} - \text{proj}_{\mathbf{u}}(\mathbf{v}) $$
is orthogonal to $\mathbf{u}$ . Thus every vector can be decomposed uniquely into a parallel and perpendicular part with respect to another vector.
Example 1.4.3. Let $\mathbf{u} = (1,0)$ , $\mathbf{v} = (2,3)$ . Then
$$ \text{proj}_{\mathbf{u}}(\mathbf{v}) = \frac{(1,0)\cdot(2,3)}{(1,0)\cdot(1,0)} (1,0) = \frac{2}{1}(1,0) = (2,0). $$
Thus
$$ \mathbf{v} = (2,3) = (2,0) + (0,3), $$
where $(2,0)$ is parallel to $(1,0)$ and $(0,3)$ is orthogonal to it.
Orthogonal Decomposition
In general, if $\mathbf{u}
eq \mathbf{0}$ and $\mathbf{v} \in \mathbb{R}^n$ , then
$$ \mathbf{v} = \text{proj}{\mathbf{u}}(\mathbf{v}) + \big(\mathbf{v} - \text{proj}{\mathbf{u}}(\mathbf{v})\big), $$
where the first term is parallel to $\mathbf{u}$ and the second term is orthogonal. This decomposition underlies methods such as least squares approximation and the Gram–Schmidt process.
Notation
$\mathbf{u} \perp \mathbf{v}$ : vectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal.
: vectors and are orthogonal. An orthogonal set: vectors pairwise orthogonal.
An orthonormal set: pairwise orthogonal, each of norm 1.
Why this matters
Orthogonality gives structure to vector spaces. It provides a way to separate independent directions cleanly, simplify computations, and minimize errors in approximations. Many powerful algorithms in numerical linear algebra and data science (QR decomposition, least squares regression, PCA) rely on orthogonality.
Exercises 1.4
Verify that the vectors $(1,2,2)$ and $(2,0,-1)$ are orthogonal. Find the projection of $(3,4)$ onto $(1,1)$ . Show that any two distinct standard basis vectors in $\mathbb{R}^n$ are orthogonal. Decompose $(5,2)$ into components parallel and orthogonal to $(2,1)$ . Prove that if $\mathbf{u}, \mathbf{v}$ are orthogonal and nonzero, then $(\mathbf{u}+\mathbf{v})\cdot(\mathbf{u}-\mathbf{v}) = 0$ .
Chapter 2. Matrices
2.1 Definition and Notation
Matrices are the central objects of linear algebra, providing a compact way to represent and manipulate linear transformations, systems of equations, and structured data. A matrix is a rectangular array of numbers arranged in rows and columns.
Formal Definition
An $m \times n$ matrix is an array with $m$ rows and $n$ columns, written
$$ A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}. $$
Each entry $a_{ij}$ is a scalar, located in the i-th row and j-th column. The size (or dimension) of the matrix is denoted by $m \times n$ .
If $m = n$ , the matrix is square.
, the matrix is square. If $m = 1$ , the matrix is a row vector.
, the matrix is a row vector. If $n = 1$ , the matrix is a column vector.
Thus, vectors are simply special cases of matrices.
Examples
Example 2.1.1. A $2 \times 3$ matrix:
$$ A = \begin{bmatrix} 1 & -2 & 4 \\ 0 & 3 & 5 \end{bmatrix}. $$
Here, $a_{12} = -2$ , $a_{23} = 5$ , and the matrix has 2 rows, 3 columns.
Example 2.1.2. A $3 \times 3$ square matrix:
$$ B = \begin{bmatrix} 2 & 0 & 1 \\ -1 & 3 & 4 \\ 0 & 5 & -2 \end{bmatrix}. $$
This will later serve as the representation of a linear transformation on $\mathbb{R}^3$ .
Indexing and Notation
Matrices are denoted by uppercase bold letters: $A, B, C$ .
. Entries are written as $a_{ij}$ , with the row index first, column index second.
, with the row index first, column index second. The set of all real $m \times n$ matrices is denoted $\mathbb{R}^{m \times n}$ .
Thus, a matrix is a function $A: {1,\dots,m} \times {1,\dots,n} \to \mathbb{R}$ , assigning a scalar to each row-column position.
Why this matters
Matrices generalize vectors and give us a language for describing linear operations systematically. They encode systems of equations, rotations, projections, and transformations of data. With matrices, algebra and geometry come together: a single compact object can represent both numerical data and functional rules.
Exercises 2.1
Write a $3 \times 2$ matrix of your choice and identify its entries $a_{ij}$ . Is every vector a matrix? Is every matrix a vector? Explain. Which of the following are square matrices: $A \in \mathbb{R}^{4\times4}$ , $B \in \mathbb{R}^{3\times5}$ , $C \in \mathbb{R}^{1\times1}$ ? Let $D = \begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}$. What kind of matrix is this? Consider the matrix $E = \begin{bmatrix} a & b \ c & d \end{bmatrix}$. Express $e_{11}, e_{12}, e_{21}, e_{22}$ explicitly.
2.2 Matrix Addition and Multiplication
Once matrices are defined, the next step is to understand how they combine. Just as vectors gain meaning through addition and scalar multiplication, matrices become powerful through two operations: addition and multiplication.
Matrix Addition
Two matrices of the same size are added by adding corresponding entries. If
$$ A = [a_{ij}] \in \mathbb{R}^{m \times n}, \quad B = [b_{ij}] \in \mathbb{R}^{m \times n}, $$
then
$$ A + B = [a_{ij} + b_{ij}] \in \mathbb{R}^{m \times n}. $$
Example 2.2.1. Let
$$ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} -1 & 0 \\ 5 & 2 \end{bmatrix}. $$
Then
$$ A + B = \begin{bmatrix} 1 + (-1) & 2 + 0 \ 3 + 5 & 4 + 2 \end{bmatrix}
\begin{bmatrix} 0 & 2 \ 8 & 6 \end{bmatrix}. $$
Matrix addition is commutative ( $A+B = B+A$ ) and associative ($(A+B)+C = A+(B+C)$). The zero matrix, with all entries 0, acts as the additive identity.
Scalar Multiplication
For a scalar $c \in \mathbb{R}$ and a matrix $A = [[a_{ij}]$ , we define
$$ cA = [c \cdot a_{ij}]. $$
This stretches or shrinks all entries of the matrix uniformly.
Example 2.2.2. If
$$ A = \begin{bmatrix} 2 & -1 \\ 0 & 3 \end{bmatrix}, \quad c = -2, $$
then
$$ cA = \begin{bmatrix} -4 & 2 \\ 0 & -6 \end{bmatrix}. $$
Matrix Multiplication
The defining operation of matrices is multiplication. If
$$ A \in \mathbb{R}^{m \times n}, \quad B \in \mathbb{R}^{n \times p}, $$
then their product is the $m \times p$ matrix
$$ AB = C = [c_{ij}], \quad c_{ij} = \sum_{k=1}^n a_{ik} b_{kj}. $$
Thus, the entry in the $i$ -th row and $j$ -th column of $AB$ is the dot product of the $i$ -th row of $A$ with the $j$ -th column of $B$ .
Example 2.2.3. Let
$$ A = \begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix}, \quad B = \begin{bmatrix} 4 & -1 \\ 2 & 5 \end{bmatrix}. $$
Then
$$ AB = \begin{bmatrix} 1\cdot4 + 2\cdot2 & 1\cdot(-1) + 2\cdot5 \ 0\cdot4 + 3\cdot2 & 0\cdot(-1) + 3\cdot5 \end{bmatrix}
\begin{bmatrix} 8 & 9 \ 6 & 15 \end{bmatrix}. $$
Notice that matrix multiplication is not commutative in general: $AB
eq BA$ . Sometimes $BA$ may not even be defined if dimensions do not align.
Geometric Meaning
Matrix multiplication corresponds to the composition of linear transformations. If $A$ transforms vectors in $\mathbb{R}^n$ and $B$ transforms vectors in $\mathbb{R}^p$ , then $AB$ represents applying $B$ first, then $A$ . This makes matrices the algebraic language of transformations.
Notation
Matrix sum: $A+B$ .
. Scalar multiple: $cA$ .
. Product: $AB$ , defined only when the number of columns of $A$ equals the number of rows of $B$ .
Why this matters
Matrix multiplication is the core mechanism of linear algebra: it encodes how transformations combine, how systems of equations are solved, and how data flows in modern algorithms. Addition and scalar multiplication make matrices into a vector space, while multiplication gives them an algebraic structure rich enough to model geometry, computation, and networks.
Exercises 2.2
Compute $A+B$ for
$$ A = \begin{bmatrix} 2 & 3 \ -1 & 0 \end{bmatrix}, \quad B = \begin{bmatrix} 4 & -2 \ 5 & 7 \end{bmatrix}. $$
Find $3A$ where
$$ A = \begin{bmatrix} 1 & -4 \ 2 & 6 \end{bmatrix}. $$
Multiply
$$ A = \begin{bmatrix} 1 & 0 & 2 \ -1 & 3 & 1 \end{bmatrix}, \quad B = \begin{bmatrix} 2 & 1 \ 0 & -1 \ 3 & 4 \end{bmatrix}. $$
Verify with an explicit example that $AB
eq BA$ . Prove that matrix multiplication is distributive: $A(B+C) = AB + AC$ .
2.3 Transpose and Inverse
Two special operations on matrices-the transpose and the inverse-give rise to deep algebraic and geometric properties. The transpose rearranges a matrix by flipping it across its main diagonal, while the inverse, when it exists, acts as the undo operation for matrix multiplication.
The Transpose
The transpose of an $m \times n$ matrix $A = [a_{ij}]$ is the $n \times m$ matrix $A^T = [a_{ji}]$ , obtained by swapping rows and columns.
Formally,
$$ (A^T){ij} = a{ji}. $$
Example 2.3.1. If
$$ A = \begin{bmatrix} 1 & 4 & -2 \\ 0 & 3 & 5 \end{bmatrix}, $$
then
$$ A^T = \begin{bmatrix} 1 & 0 \\ 4 & 3 \\ -2 & 5 \end{bmatrix}. $$
Properties of the Transpose.
$ (A^T)^T = A$. $ (A+B)^T = A^T + B^T$. $ (cA)^T = cA^T$, for scalar $c$ . $ (AB)^T = B^T A^T$.
The last rule is crucial: the order reverses.
The Inverse
A square matrix $A \in \mathbb{R}^{n \times n}$ is said to be invertible (or nonsingular) if there exists another matrix $A^{-1}$ such that
$$ AA^{-1} = A^{-1}A = I_n, $$
where $I_n$ is the $n \times n$ identity matrix. In this case, $A^{-1}$ is called the inverse of $A$ .
Not every matrix is invertible. A necessary condition is that $\det(A)
eq 0$ , a fact that will be developed in Chapter 6.
Example 2.3.2. Let
$$ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}. $$
Its determinant is $\det(A) = (1)(4) - (2)(3) = -2
eq 0$ . The inverse is
$$ A^{-1} = \frac{1}{\det(A)} \begin{bmatrix} 4 & -2 \ -3 & 1 \end{bmatrix}
\begin{bmatrix} -2 & 1 \ 1.5 & -0.5 \end{bmatrix}. $$
Verification:
$$ AA^{-1} = \begin{bmatrix} 1 & 2 \ 3 & 4 \end{bmatrix} \begin{bmatrix} -2 & 1 \ 1.5 & -0.5 \end{bmatrix}
\begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}. $$
Geometric Meaning
The transpose corresponds to reflecting a linear transformation across the diagonal. For vectors, it switches between row and column forms.
The inverse, when it exists, corresponds to reversing a linear transformation. For example, if $A$ scales and rotates vectors, $A^{-1}$ rescales and rotates them back.
Notation
Transpose: $A^T$ .
. Inverse: $A^{-1}$ , defined only for invertible square matrices.
, defined only for invertible square matrices. Identity: $I_n$ , acts as the multiplicative identity.
Why this matters
The transpose allows us to define symmetric and orthogonal matrices, central to geometry and numerical methods. The inverse underlies the solution of linear systems, encoding the idea of undoing a transformation. Together, these operations set the stage for determinants, eigenvalues, and orthogonalization.
Exercises 2.3
Compute the transpose of
$$ A = \begin{bmatrix} 2 & -1 & 3 \ 0 & 4 & 5 \end{bmatrix}. $$
Verify that $(AB)^T = B^T A^T$ for
$$ A = \begin{bmatrix} 1 & 2 \ 0 & 1 \end{bmatrix}, \quad B = \begin{bmatrix} 3 & 4 \ 5 & 6 \end{bmatrix}. $$
Determine whether
$$ C = \begin{bmatrix} 2 & 1 \ 4 & 2 \end{bmatrix} $$
is invertible. If so, find $C^{-1}$ .
Find the inverse of
$$ D = \begin{bmatrix} 0 & 1 \ -1 & 0 \end{bmatrix}, $$
and explain its geometric action on vectors in the plane.
Prove that if $A$ is invertible, then so is $A^T$ , and $(A^T)^{-1} = (A^{-1})^T$ .
2.4 Special Matrices
Certain matrices occur so frequently in theory and applications that they are given special names. Recognizing their properties allows us to simplify computations and understand the structure of linear transformations more clearly.
The Identity Matrix
The identity matrix $I_n$ is the $n \times n$ matrix with ones on the diagonal and zeros elsewhere:
$$ I_n = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}. $$
It acts as the multiplicative identity:
$$ AI_n = I_nA = A, \quad \text{for all } A \in \mathbb{R}^{n \times n}. $$
Geometrically, $I_n$ represents the transformation that leaves every vector unchanged.
Diagonal Matrices
A diagonal matrix has all off-diagonal entries zero:
$$ D = \begin{bmatrix} d_{11} & 0 & \cdots & 0 \\ 0 & d_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_{nn} \end{bmatrix}. $$
Multiplication by a diagonal matrix scales each coordinate independently:
$$ D\mathbf{x} = (d_{11}x_1, d_{22}x_2, \dots, d_{nn}x_n). $$
Example 2.4.1. Let
$$ D = \begin{bmatrix} 2 & 0 & 0 \ 0 & 3 & 0 \ 0 & 0 & -1 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} 1 \ 4 \ -2 \end{bmatrix}. $$
Then
$$ D\mathbf{x} = \begin{bmatrix} 2 \ 12 \ 2 \end{bmatrix}. $$
Permutation Matrices
A permutation matrix is obtained by permuting the rows of the identity matrix. Multiplying a vector by a permutation matrix reorders its coordinates.
Example 2.4.2. Let
$$ P = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}. $$
Then
$$ P\begin{bmatrix} a \ b \ c \end{bmatrix} = \begin{bmatrix} b \ a \ c \end{bmatrix}. $$
Thus, $P$ swaps the first two coordinates.
Permutation matrices are always invertible; their inverses are simply their transposes.
Symmetric and Skew-Symmetric Matrices
A matrix is symmetric if
$$ A^T = A, $$
and skew-symmetric if
$$ A^T = -A. $$
Symmetric matrices appear in quadratic forms and optimization, while skew-symmetric matrices describe rotations and cross products in geometry.
Orthogonal Matrices
A square matrix $Q$ is orthogonal if
$$ Q^T Q = QQ^T = I. $$
Equivalently, the rows (and columns) of $Q$ form an orthonormal set. Orthogonal matrices preserve lengths and angles; they represent rotations and reflections.
Example 2.4.3. The rotation matrix in the plane:
$$ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} $$
is orthogonal, since
$$ R(\theta)^T R(\theta) = I_2. $$
Why this matters
Special matrices serve as the building blocks of linear algebra. Identity matrices define the neutral element, diagonal matrices simplify computations, permutation matrices reorder data, symmetric and orthogonal matrices describe fundamental geometric structures. Much of modern applied mathematics reduces complex problems to operations involving these simple forms.
Exercises 2.4
Show that the product of two diagonal matrices is diagonal, and compute an example. Find the permutation matrix that cycles $(a,b,c)$ into $(b,c,a)$ . Prove that every permutation matrix is invertible and its inverse is its transpose. Verify that
$$ Q = \begin{bmatrix} 0 & 1 \ -1 & 0 \end{bmatrix} $$
is orthogonal. What geometric transformation does it represent? 5. Determine whether
$$ A = \begin{bmatrix} 2 & 3 \ 3 & 2 \end{bmatrix}, \quad B = \begin{bmatrix} 0 & 5 \ -5 & 0 \end{bmatrix} $$
are symmetric, skew-symmetric, or neither.
Chapter 3. Systems of Linear Equations
3.1 Linear Systems and Solutions
One of the central motivations for linear algebra is solving systems of linear equations. These systems arise naturally in science, engineering, and data analysis whenever multiple constraints interact. Matrices provide a compact language for expressing and solving them.
Linear Systems
A linear system consists of equations where each unknown appears only to the first power and with no products between variables. A general system of $m$ equations in $n$ unknowns can be written as:
$$ \begin{aligned} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n &= b_1, \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n &= b_2, \\ &\vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n &= b_m. \end{aligned} $$
Here the coefficients $a_{ij}$ and constants $b_i$ are scalars, and the unknowns are $x_1, x_2, \dots, x_n$ .
Matrix Form
The system can be expressed compactly as:
$$ A\mathbf{x} = \mathbf{b}, $$
where
$A \in \mathbb{R}^{m \times n}$ is the coefficient matrix $[a_{ij}]$ ,
is the coefficient matrix , $\mathbf{x} \in \mathbb{R}^n$ is the column vector of unknowns,
is the column vector of unknowns, $\mathbf{b} \in \mathbb{R}^m$ is the column vector of constants.
This formulation turns the problem of solving equations into analyzing the action of a matrix.
Example 3.1.1. The system
$$ \begin{cases} x + 2y = 5, \\ 3x - y = 4 \end{cases} $$
can be written as
$$ \begin{bmatrix} 1 & 2 \ 3 & -1 \end{bmatrix} \begin{bmatrix} x \ y \end{bmatrix}
\begin{bmatrix} 5 \ 4 \end{bmatrix}. $$
Types of Solutions
A linear system may have:
No solution (inconsistent): The equations conflict. Example: $ \begin{cases} x + y = 1 \ x + y = 2 \end{cases} $ has no solution. Exactly one solution (unique): The system’s equations intersect at a single point. Example: The above system with coefficient matrix $ \begin{bmatrix} 1 & 2 \ 3 & -1 \end{bmatrix} $ has a unique solution. Infinitely many solutions: The equations describe overlapping constraints (e.g., multiple equations representing the same line or plane).
The nature of the solution depends on the rank of $A$ and its relation to the augmented matrix $(A|\mathbf{b})$ , which we will study later.
Geometric Interpretation
In $\mathbb{R}^2$ , each linear equation represents a line. Solving a system means finding intersection points of lines.
, each linear equation represents a line. Solving a system means finding intersection points of lines. In $\mathbb{R}^3$ , each equation represents a plane. A system may have no solution (parallel planes), one solution (a unique intersection point), or infinitely many (a line of intersection).
, each equation represents a plane. A system may have no solution (parallel planes), one solution (a unique intersection point), or infinitely many (a line of intersection). In higher dimensions, the picture generalizes: solutions form intersections of hyperplanes.
Why this matters
Linear systems are the practical foundation of linear algebra. They appear in balancing chemical reactions, circuit analysis, least-squares regression, optimization, and computer graphics. Understanding how to represent and classify their solutions is the first step toward systematic solution methods like Gaussian elimination.
Exercises 3.1
Write the following system in matrix form: $ \begin{cases} 2x + 3y - z = 7, \ x - y + 4z = 1, \ 3x + 2y + z = 5 \end{cases} $ Determine whether the system $ \begin{cases} x + y = 1, \ 2x + 2y = 2 \end{cases} $ has no solution, one solution, or infinitely many solutions. Geometrically interpret the system $ \begin{cases} x + y = 3, \ x - y = 1 \end{cases} $ in the plane. Solve the system $ \begin{cases} 2x + y = 1, \ x - y = 4 \end{cases} $ and check your solution. In $\mathbb{R}^3$ , describe the solution set of $ \begin{cases} x + y + z = 0, \ 2x + 2y + 2z = 0 \end{cases} $. What geometric object does it represent?
3.2 Gaussian Elimination
To solve linear systems efficiently, we use Gaussian elimination: a systematic method of transforming a system into a simpler equivalent one whose solutions are easier to see. The method relies on elementary row operations that preserve the solution set.
Elementary Row Operations
On an augmented matrix $(A|\mathbf{b})$ , we are allowed three operations:
Row swapping: interchange two rows. Row scaling: multiply a row by a nonzero scalar. Row replacement: replace one row by itself plus a multiple of another row.
These operations correspond to re-expressing equations in different but equivalent forms.
Row Echelon Form
A matrix is in row echelon form (REF) if:
All nonzero rows are above any zero rows. Each leading entry (the first nonzero number from the left in a row) is to the right of the leading entry in the row above. All entries below a leading entry are zero.
Further, if each leading entry is 1 and is the only nonzero entry in its column, the matrix is in reduced row echelon form (RREF).
Algorithm of Gaussian Elimination
Write the augmented matrix for the system. Use row operations to create zeros below each pivot (the leading entry in a row). Continue column by column until the matrix is in echelon form. Solve by back substitution: starting from the last pivot equation and working upward.
If we continue to RREF, the solution can be read off directly.
Example
Example 3.2.1. Solve
$$ \begin{cases} x + 2y - z = 3, \\ 2x + y + z = 7, \\ 3x - y + 2z = 4. \end{cases} $$
Step 1. Augmented matrix
$$ \left[\begin{array}{ccc|c} 1 & 2 & -1 & 3 \\ 2 & 1 & 1 & 7 \\ 3 & -1 & 2 & 4 \end{array}\right]. $$
Step 2. Eliminate below the first pivot
Subtract 2 times row 1 from row 2, and 3 times row 1 from row 3:
$$ \left[\begin{array}{ccc|c} 1 & 2 & -1 & 3 \\ 0 & -3 & 3 & 1 \\ 0 & -7 & 5 & -5 \end{array}\right]. $$
Step 3. Pivot in column 2
Divide row 2 by -3:
$$ \left[\begin{array}{ccc|c} 1 & 2 & -1 & 3 \\ 0 & 1 & -1 & -\tfrac{1}{3} \\ 0 & -7 & 5 & -5 \end{array}\right]. $$
Add 7 times row 2 to row 3:
$$ \left[\begin{array}{ccc|c} 1 & 2 & -1 & 3 \\ 0 & 1 & -1 & -\tfrac{1}{3} \\ 0 & 0 & -2 & -\tfrac{22}{3} \end{array}\right]. $$
Step 4. Pivot in column 3
Divide row 3 by -2:
$$ \left[\begin{array}{ccc|c} 1 & 2 & -1 & 3 \\ 0 & 1 & -1 & -\tfrac{1}{3} \\ 0 & 0 & 1 & \tfrac{11}{3} \end{array}\right]. $$
Step 5. Back substitution
From the last row: $ z = \tfrac{11}{3}. $
Second row: $ y - z = -\tfrac{1}{3} \implies y = -\tfrac{1}{3} + \tfrac{11}{3} = \tfrac{10}{3}. $
First row: $ x + 2y - z = 3 \implies x + 2\cdot\tfrac{10}{3} - \tfrac{11}{3} = 3. $
So $ x + \tfrac{20}{3} - \tfrac{11}{3} = 3 \implies x + 3 = 3 \implies x = 0. $
Solution: $ (x,y,z) = \big(0, \tfrac{10}{3}, \tfrac{11}{3}\big). $
Why this matters
Gaussian elimination is the foundation of computational linear algebra. It reduces complex systems to a form where solutions are visible, and it forms the basis for algorithms used in numerical analysis, scientific computing, and machine learning.
Exercises 3.2
Solve by Gaussian elimination: $ \begin{cases} x + y = 2, \ 2x - y = 0. \end{cases} $ Reduce the following augmented matrix to REF: $ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \ 2 & -1 & 3 & 14 \ 1 & 4 & -2 & -2 \end{array}\right]. $ Show that Gaussian elimination always produces either: a unique solution,
infinitely many solutions, or
a contradiction (no solution). Use Gaussian elimination to find all solutions of $ \begin{cases} x + y + z = 0, \ 2x + y + z = 1. \end{cases} $ Explain why pivoting (choosing the largest available pivot element) is useful in numerical computation.
3.3 Rank and Consistency
Gaussian elimination not only provides solutions but also reveals the structure of a linear system. Two key ideas are the rank of a matrix and the consistency of a system. Rank measures the amount of independent information in the equations, while consistency determines whether the system has at least one solution.
Rank of a Matrix
The rank of a matrix is the number of leading pivots in its row echelon form. Equivalently, it is the maximum number of linearly independent rows or columns.
Formally,
$$ \text{rank}(A) = \dim(\text{row space of } A) = \dim(\text{column space of } A). $$
The rank tells us the effective dimension of the space spanned by the rows (or columns).
Example 3.3.1. For
$$ A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 6 & 9 \end{bmatrix}, $$
row reduction gives
$$ \begin{bmatrix} 1 & 2 & 3 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. $$
Thus, $\text{rank}(A) = 1$ , since all rows are multiples of the first.
Consistency of Linear Systems
Consider the system $A\mathbf{x} = \mathbf{b}$ . The system is consistent (has at least one solution) if and only if
$$ \text{rank}(A) = \text{rank}(A|\mathbf{b}), $$
where $(A|\mathbf{b})$ is the augmented matrix. If the ranks differ, the system is inconsistent.
If $\text{rank}(A) = \text{rank}(A|\mathbf{b}) = n$ (number of unknowns), the system has a unique solution.
(number of unknowns), the system has a unique solution. If $\text{rank}(A) = \text{rank}(A|\mathbf{b}) < n$ , the system has infinitely many solutions.
Example
Example 3.3.2. Consider
$$ \begin{cases} x + y + z = 1, \\ 2x + 2y + 2z = 2, \\ x + y + z = 3. \end{cases} $$
The augmented matrix is
$$ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 \\ 1 & 1 & 1 & 3 \end{array}\right]. $$
Row reduction gives
$$ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 \end{array}\right]. $$
Here, $\text{rank}(A) = 1$ , but $\text{rank}(A|\mathbf{b}) = 2$ . Since the ranks differ, the system is inconsistent: no solution exists.
Example with Infinite Solutions
Example 3.3.3. For
$$ \begin{cases} x + y = 2, \\ 2x + 2y = 4, \end{cases} $$
the augmented matrix reduces to
$$ \left[\begin{array}{cc|c} 1 & 1 & 2 \\ 0 & 0 & 0 \end{array}\right]. $$
Here, $\text{rank}(A) = \text{rank}(A|\mathbf{b}) = 1 < 2$ . Thus, infinitely many solutions exist, forming a line.
Why this matters
Rank is a measure of independence: it tells us how many truly distinct equations or directions are present. Consistency explains when equations align versus when they contradict. These concepts connect linear systems to vector spaces and prepare for the ideas of dimension, basis, and the Rank–Nullity Theorem.
Exercises 3.3
Compute the rank of
$$ A = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 1 & -1 \\ 2 & 5 & -1 \end{bmatrix}. $$
Determine whether the system
$$ \begin{cases} x + y + z = 1, \\ 2x + 3y + z = 2, \\ 3x + 5y + 2z = 3 \end{cases} $$
is consistent.
Show that the rank of the identity matrix $I_n$ is $n$ . Give an example of a system in $\mathbb{R}^3$ with infinitely many solutions, and explain why it satisfies the rank condition. Prove that for any matrix $A \in \mathbb{R}^{m \times n}$ , $ \text{rank}(A) \leq \min(m,n). $
3.4 Homogeneous Systems
A homogeneous system is a linear system in which all constant terms are zero:
$$ A\mathbf{x} = \mathbf{0}, $$
where $A \in \mathbb{R}^{m \times n}$ , and $\mathbf{0}$ is the zero vector in $\mathbb{R}^m$ .
The Trivial Solution
Every homogeneous system has at least one solution:
$$ \mathbf{x} = \mathbf{0}. $$
This is called the trivial solution. The interesting question is whether nontrivial solutions (nonzero vectors) exist.
Existence of Nontrivial Solutions
Nontrivial solutions exist precisely when the number of unknowns exceeds the rank of the coefficient matrix:
$$ \text{rank}(A) < n. $$
In this case, there are infinitely many solutions, forming a subspace of $\mathbb{R}^n$ . The dimension of this solution space is
$$ \dim(\text{null}(A)) = n - \text{rank}(A), $$
where null(A) is the set of all solutions to $A\mathbf{x} = 0$ . This set is called the null space or kernel of $A$ .
Example
Example 3.4.1. Consider
$$ \begin{cases} x + y + z = 0, \\ 2x + y - z = 0. \end{cases} $$
The augmented matrix is
$$ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 0 \\ 2 & 1 & -1 & 0 \end{array}\right]. $$
Row reduction:
$$ \left[\begin{array}{ccc|c} 1 & 1 & 1 & 0 \\ 0 & -1 & -3 & 0 \end{array}\right] \quad\to\quad \left[\begin{array}{ccc|c} 1 & 1 & 1 & 0 \\ 0 & 1 & 3 & 0 \end{array}\right]. $$
So the system is equivalent to:
$$ \begin{cases} x + y + z = 0, \\ y + 3z = 0. \end{cases} $$
From the second equation, $y = -3z$ . Substituting into the first: $ x - 3z + z = 0 \implies x = 2z. $
Thus solutions are:
$$ (x,y,z) = z(2, -3, 1), \quad z \in \mathbb{R}. $$
The null space is the line spanned by the vector $(2, -3, 1)$ .
Geometric Interpretation
The solution set of a homogeneous system is always a subspace of $\mathbb{R}^n$ .
If $\text{rank}(A) = n$ , the only solution is the zero vector.
, the only solution is the zero vector. If $\text{rank}(A) = n-1$ , the solution set is a line through the origin.
, the solution set is a line through the origin. If $\text{rank}(A) = n-2$ , the solution set is a plane through the origin.
More generally, the null space has dimension $n - \text{rank}(A)$ , known as the nullity.
Why this matters
Homogeneous systems are central to understanding vector spaces, subspaces, and dimension. They lead directly to the concepts of kernel, null space, and linear dependence. In applications, homogeneous systems appear in equilibrium problems, eigenvalue equations, and computer graphics transformations.
Exercises 3.4
Solve the homogeneous system
$$ \begin{cases} x + 2y - z = 0, \\ 2x + 4y - 2z = 0. \end{cases} $$
What is the dimension of its solution space?
Find all solutions of
$$ \begin{cases} x - y + z = 0, \\ 2x + y - z = 0. \end{cases} $$
Show that the solution set of any homogeneous system is a subspace of $\mathbb{R}^n$ . Suppose $A$ is a $3 \times 3$ matrix with $\text{rank}(A) = 2$ . What is the dimension of the null space of $A$ ? For
$$ A = \begin{bmatrix} 1 & 2 & -1 \ 0 & 1 & 3 \end{bmatrix}, $$
compute a basis for the null space of $A$ .
Chapter 4. Vector Spaces
4.1 Definition of a Vector Space
Up to now we have studied vectors and matrices concretely in $\mathbb{R}^n$ . The next step is to move beyond coordinates and define vector spaces in full generality. A vector space is an abstract setting where the familiar rules of addition and scalar multiplication hold, regardless of whether the elements are geometric vectors, polynomials, functions, or other objects.
Formal Definition
A vector space over the real numbers $\mathbb{R}$ is a set $V$ equipped with two operations:
Vector addition: For any $\mathbf{u}, \mathbf{v} \in V$ , there is a vector $\mathbf{u} + \mathbf{v} \in V$ . Scalar multiplication: For any scalar $c \in \mathbb{R}$ and any $\mathbf{v} \in V$ , there is a vector $c\mathbf{v} \in V$ .
These operations must satisfy the following axioms (for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and all scalars $a,b \in \mathbb{R}$ ):
Commutativity of addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ . Associativity of addition: $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ . Additive identity: There exists a zero vector $\mathbf{0} \in V$ such that $\mathbf{v} + \mathbf{0} = \mathbf{v}$ . Additive inverses: For each $\mathbf{v} \in V$ , there exists $(-\mathbf{v} \in V$ such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ . Compatibility of scalar multiplication: $a(b\mathbf{v}) = (ab)\mathbf{v}$ . Identity element of scalars: $1 \cdot \mathbf{v} = \mathbf{v}$ . Distributivity over vector addition: $a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}$ . Distributivity over scalar addition: $(a+b)\mathbf{v} = a\mathbf{v} + b\mathbf{v}$ .
If a set $V$ with operations satisfies all eight axioms, we call it a vector space.
Examples
Example 4.1.1. Standard Euclidean space $\mathbb{R}^n$ with ordinary addition and scalar multiplication is a vector space. This is the model case from which the axioms are abstracted.
Example 4.1.2. Polynomials The set of all polynomials with real coefficients, denoted $\mathbb{R}[x]$ , forms a vector space. Addition and scalar multiplication are defined term by term.
Example 4.1.3. Functions The set of all real-valued functions on an interval, e.g. $f: [0,1] \to \mathbb{R}$ , forms a vector space, since functions can be added and scaled pointwise.
Not every set with operations qualifies. For instance, the set of positive real numbers under usual addition is not a vector space, because additive inverses (negative numbers) are missing. The axioms must all hold.
Geometric Interpretation
In familiar cases like $\mathbb{R}^2$ or $\mathbb{R}^3$ , vector spaces provide the stage for geometry: vectors can be added, scaled, and combined to form lines, planes, and higher-dimensional structures. In abstract settings like function spaces, the same algebraic rules let us apply geometric intuition to infinite-dimensional problems.
Why this matters
The concept of vector space unifies seemingly different mathematical objects under a single framework. Whether dealing with forces in physics, signals in engineering, or data in machine learning, the common language of vector spaces allows us to use the same techniques everywhere.
Exercises 4.1
Verify that $\mathbb{R}^2$ with standard addition and scalar multiplication satisfies all eight vector space axioms. Show that the set of integers $\mathbb{Z}$ with ordinary operations is not a vector space over $\mathbb{R}$ . Which axiom fails? Consider the set of all polynomials of degree at most 3. Show it forms a vector space over $\mathbb{R}$ . What is its dimension? Give an example of a vector space where the vectors are not geometric objects. Prove that in any vector space, the zero vector is unique.
4.2 Subspaces
A subspace is a smaller vector space living inside a larger one. Just as lines and planes naturally sit inside three-dimensional space, subspaces generalize these ideas to higher dimensions and more abstract settings.
Definition
Let $V$ be a vector space. A subset $W \subseteq V$ is called a subspace of $V$ if:
$\mathbf{0} \in W$ (contains the zero vector), For all $\mathbf{u}, \mathbf{v} \in W$ , the sum $\mathbf{u} + \mathbf{v} \in W$ (closed under addition), For all scalars $c \in \mathbb{R}$ and vectors $\mathbf{v} \in W$ , the product $c\mathbf{v} \in W$ (closed under scalar multiplication).
If these hold, then $W$ is itself a vector space with the inherited operations.
Examples
Example 4.2.1. Line through the origin in $\mathbb{R}^2$ The set
$$ W = { (t, 2t) \mid t \in \mathbb{R} } $$
is a subspace of $\mathbb{R}^2$ . It contains the zero vector, is closed under addition, and is closed under scalar multiplication.
Example 4.2.2. The x–y plane in $\mathbb{R}^3$ The set
$$ W = { (x, y, 0) \mid x,y \in \mathbb{R} } $$
is a subspace of $\mathbb{R}^3$ . It is the collection of all vectors lying in the plane through the origin parallel to the x–y plane.
Example 4.2.3. Null space of a matrix For a matrix $A \in \mathbb{R}^{m \times n}$ , the null space
$$ { \mathbf{x} \in \mathbb{R}^n \mid A\mathbf{x} = \mathbf{0} } $$
is a subspace of $\mathbb{R}^n$ . This subspace represents all solutions to the homogeneous system.
Not every subset is a subspace.
The set ${ (x,y) \in \mathbb{R}^2 \mid x \geq 0 }$ is not a subspace: it is not closed under scalar multiplication (a negative scalar breaks the condition).
is not a subspace: it is not closed under scalar multiplication (a negative scalar breaks the condition). Any line in $\mathbb{R}^2$ that does not pass through the origin is not a subspace, because it does not contain $\mathbf{0}$ .
Geometric Interpretation
Subspaces are the linear structures inside vector spaces.
In $\mathbb{R}^2$ , the subspaces are: the zero vector, any line through the origin, or the entire plane.
, the subspaces are: the zero vector, any line through the origin, or the entire plane. In $\mathbb{R}^3$ , the subspaces are: the zero vector, any line through the origin, any plane through the origin, or the entire space.
, the subspaces are: the zero vector, any line through the origin, any plane through the origin, or the entire space. In higher dimensions, the same principle applies: subspaces are the flat linear pieces through the origin.
Why this matters
Subspaces capture the essential structure of linear problems. Column spaces, row spaces, and null spaces are all subspaces. Much of linear algebra consists of understanding how these subspaces intersect, span, and complement each other.
Exercises 4.2
Prove that the set $W = { (x,0) \mid x \in \mathbb{R} } \subseteq \mathbb{R}^2$ is a subspace. Show that the line ${ (1+t, 2t) \mid t \in \mathbb{R} }$ is not a subspace of $\mathbb{R}^2$ . Which condition fails? Determine whether the set of all vectors $(x,y,z) \in \mathbb{R}^3$ satisfying $x+y+z=0$ is a subspace. For the matrix
$$ A = \begin{bmatrix} 1 & 2 & 3 \ 4 & 5 & 6 \end{bmatrix}, $$
describe the null space of $A$ as a subspace of $\mathbb{R}^3$ . 5. List all possible subspaces of $\mathbb{R}^2$ .
4.3 Span, Basis, Dimension
The ideas of span, basis, and dimension provide the language for describing the size and structure of subspaces. Together, they tell us how a vector space is generated, how many building blocks it requires, and how those blocks can be chosen.
Span
Given a set of vectors ${\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k} \subseteq V$ , the span is the collection of all linear combinations:
$$ \text{span}{\mathbf{v}_1, \dots, \mathbf{v}_k} = { c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k \mid c_i \in \mathbb{R} }. $$
The span is always a subspace of $V$ , namely the smallest subspace containing those vectors.
Example 4.3.1. In $\mathbb{R}^2$ , $ \text{span}{(1,0)} = {(x,0) \mid x \in \mathbb{R}},$ the x-axis. Similarly, $\text{span}{(1,0),(0,1)} = \mathbb{R}^2.$
Basis
A basis of a vector space $V$ is a set of vectors that:
Span $V$ . Are linearly independent (no vector in the set is a linear combination of the others).
If either condition fails, the set is not a basis.
Example 4.3.2. In $\mathbb{R}^3$ , the standard unit vectors
$$ \mathbf{e}_1 = (1,0,0), \quad \mathbf{e}_2 = (0,1,0), \quad \mathbf{e}_3 = (0,0,1) $$
form a basis. Every vector $(x,y,z)$ can be uniquely written as
$$ x\mathbf{e}_1 + y\mathbf{e}_2 + z\mathbf{e}_3. $$
Dimension
The dimension of a vector space $V$ , written $\dim(V)$ , is the number of vectors in any basis of $V$ . This number is well-defined: all bases of a vector space have the same cardinality.
Examples 4.3.3.
$\dim(\mathbb{R}^2) = 2$ , with basis $(1,0), (0,1)$ .
, with basis . $\dim(\mathbb{R}^3) = 3$ , with basis $(1,0,0), (0,1,0), (0,0,1)$ .
, with basis . The set of polynomials of degree at most 3 has dimension 4, with basis $(1, x, x^2, x^3)$ .
Geometric Interpretation
The span is like the reach of a set of vectors.
A basis is the minimal set of directions needed to reach everything in the space.
The dimension is the count of those independent directions.
Lines, planes, and higher-dimensional flats can all be described in terms of span, basis, and dimension.
Why this matters
These concepts classify vector spaces and subspaces in terms of size and structure. Many theorems in linear algebra-such as the Rank–Nullity Theorem-are consequences of understanding span, basis, and dimension. In practical terms, bases are how we encode data in coordinates, and dimension tells us how much freedom a system truly has.
Exercises 4.3
Show that $(1,0,0)$ , $(0,1,0)$ , $(1,1,0)$ span the $xy$ -plane in $\mathbb{R}^3$ . Are they a basis? Find a basis for the line ${(2t,-3t,t) : t \in \mathbb{R}}$ in $\mathbb{R}^3$ . Determine the dimension of the subspace of $\mathbb{R}^3$ defined by $x+y+z=0$ . Prove that any two different bases of $\mathbb{R}^n$ must contain exactly $n$ vectors. Give a basis for the set of polynomials of degree $\leq 2$ . What is its dimension?
4.4 Coordinates
Once a basis for a vector space is chosen, every vector can be expressed uniquely as a linear combination of the basis vectors. The coefficients in this combination are called the coordinates of the vector relative to that basis. Coordinates allow us to move between the abstract world of vector spaces and the concrete world of numbers.
Coordinates Relative to a Basis
Let $V$ be a vector space, and let
$$ \mathcal{B} = {\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n} $$
be an ordered basis for $V$ . Every vector $\mathbf{u} \in V$ can be written uniquely as
$$ \mathbf{u} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_n \mathbf{v}_n. $$
The scalars $(c_1, c_2, \dots, c_n)$ are the coordinates of $\mathbf{u}$ relative to $\mathcal{B}$ , written
$$ [\mathbf{u}]_{\mathcal{B}} = \begin{bmatrix} c_1 \ c_2 \ \vdots \ c_n \end{bmatrix}. $$
Example in $\mathbb{R}^2$
Example 4.4.1. Let the basis be
$$ \mathcal{B} = { (1,1), (1,-1) }. $$
To find the coordinates of $\mathbf{u} = (3,1)$ relative to $\mathcal{B}$ , solve
$$ (3,1) = c_1(1,1) + c_2(1,-1). $$
This gives the system
$$ \begin{cases} c_1 + c_2 = 3, \\ c_1 - c_2 = 1. \end{cases} $$
Adding: $2c_1 = 4 \implies c_1 = 2$ . Then $c_2 = 1$ .
So,
$$ [\mathbf{u}]_{\mathcal{B}} = \begin{bmatrix} 2 \ 1 \end{bmatrix}. $$
Standard Coordinates
In $\mathbb{R}^n$ , the standard basis is
$$ \mathbf{e}_1 = (1,0,\dots,0), \quad \mathbf{e}_2 = (0,1,0,\dots,0), \dots, \mathbf{e}_n = (0,\dots,0,1). $$
Relative to this basis, the coordinates of a vector are simply its entries. Thus, column vectors are coordinate representations by default.
Change of Basis
If $\mathcal{B} = {\mathbf{v}_1, \dots, \mathbf{v}_n}$ is a basis of $\mathbb{R}^n$ , the change of basis matrix is
$$ P = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}, $$
with basis vectors as columns. For any vector $\mathbf{u}$ ,
$$ \mathbf{u} = P[\mathbf{u}]{\mathcal{B}}, \qquad [\mathbf{u}]{\mathcal{B}} = P^{-1}\mathbf{u}. $$
Thus, switching between bases reduces to matrix multiplication.
Geometric Interpretation
Coordinates are the address of a vector relative to a chosen set of directions. Different bases are like different coordinate systems: Cartesian, rotated, skewed, or scaled. The same vector may look very different numerically depending on the basis, but its geometric identity is unchanged.
Why this matters
Coordinates turn abstract vectors into concrete numerical data. Changing basis is the algebraic language for rotations of axes, diagonalization of matrices, and principal component analysis in data science. Mastery of coordinates is essential for moving fluidly between geometry, algebra, and computation.
Exercises 4.4
Express $(4,2)$ in terms of the basis $(1,1), (1,-1)$ . Find the coordinates of $(1,2,3)$ relative to the standard basis of $\mathbb{R}^3$ . If $\mathcal{B} = {(2,0), (0,3)}$ , compute $[ (4,6) ]_{\mathcal{B}}$ . Construct the change of basis matrix from the standard basis of $\mathbb{R}^2$ to $\mathcal{B} = {(1,1), (1,-1)}$ . Prove that coordinate representation with respect to a basis is unique.
Chapter 5. Linear Transformations
5.1 Functions that Preserve Linearity
A central theme of linear algebra is understanding linear transformations: functions between vector spaces that preserve their algebraic structure. These transformations generalize the idea of matrix multiplication and capture the essence of linear behavior.
Definition
Let $V$ and $W$ be vector spaces over $\mathbb{R}$ . A function
$$ T : V \to W $$
is called a linear transformation (or linear map) if for all vectors $\mathbf{u}, \mathbf{v} \in V$ and all scalars $c \in \mathbb{R}$ :
Additivity: $$ T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}), $$ Homogeneity: $$ T(c\mathbf{u}) = cT(\mathbf{u}). $$
If both conditions hold, then $T$ automatically respects linear combinations:
$$ T(c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k) = c_1 T(\mathbf{v}_1) + \cdots + c_k T(\mathbf{v}_k). $$
Examples
Example 5.1.1. Scaling in $\mathbb{R}^2$ . Let $T:\mathbb{R}^2 \to \mathbb{R}^2$ be defined by
$$ T(x,y) = (2x, 2y). $$
This doubles the length of every vector, preserving direction. It is linear.
Example 5.1.2. Rotation. Let $R_\theta: \mathbb{R}^2 \to \mathbb{R}^2$ be
$$ R_\theta(x,y) = (x\cos\theta - y\sin\theta, ; x\sin\theta + y\cos\theta). $$
This rotates vectors by angle $\theta$ . It satisfies additivity and homogeneity, hence is linear.
Example 5.1.3. Differentiation. Let $D: \mathbb{R}[x] \to \mathbb{R}[x]$ be differentiation: $D(p(x)) = p'(x)$ . Since derivatives respect addition and scalar multiples, differentiation is a linear transformation.
The map $S:\mathbb{R}^2 \to \mathbb{R}^2$ defined by
$$ S(x,y) = (x^2, y^2) $$
is not linear, because $S(\mathbf{u} + \mathbf{v})
eq S(\mathbf{u}) + S(\mathbf{v})$ in general.
Geometric Interpretation
Linear transformations are exactly those that preserve the origin, lines through the origin, and proportions along those lines. They include familiar operations: scaling, rotations, reflections, shears, and projections. Nonlinear transformations bend or curve space, breaking these properties.
Why this matters
Linear transformations unify geometry, algebra, and computation. They explain how matrices act on vectors, how data can be rotated or projected, and how systems evolve under linear rules. Much of linear algebra is devoted to understanding these transformations, their representations, and their invariants.
Exercises 5.1
Verify that $T(x,y) = (3x-y, 2y)$ is a linear transformation on $\mathbb{R}^2$ . Show that $T(x,y) = (x+1, y)$ is not linear. Which axiom fails? Prove that if $T$ and $S$ are linear transformations, then so is $T+S$ . Give an example of a linear transformation from $\mathbb{R}^3$ to $\mathbb{R}^2$ . Let $T:\mathbb{R}[x] \to \mathbb{R}[x]$ be integration: $$ T(p(x)) = \int_0^x p(t),dt. $$ Prove that $T$ is a linear transformation.
5.2 Matrix Representation of Linear Maps
Every linear transformation between finite-dimensional vector spaces can be represented by a matrix. This correspondence is one of the central insights of linear algebra: it lets us use the tools of matrix arithmetic to study abstract transformations.
From Linear Map to Matrix
Let $T: \mathbb{R}^n \to \mathbb{R}^m$ be a linear transformation. Choose the standard basis ${ \mathbf{e}_1, \dots, \mathbf{e}_n }$ of $\mathbb{R}^n$ , where $\mathbf{e}_i$ has a 1 in the $i$ -th position and 0 elsewhere.
The action of $T$ on each basis vector determines the entire transformation:
$$ T(\mathbf{e}j) = \begin{bmatrix} a{1j} \ a_{2j} \ \vdots \ a_{mj} \end{bmatrix}. $$
Placing these outputs as columns gives the matrix of $T$ :
$$ [T] = A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}. $$
Then for any vector $\mathbf{x} \in \mathbb{R}^n$ :
$$ T(\mathbf{x}) = A\mathbf{x}. $$
Examples
Example 5.2.1. Scaling in $\mathbb{R}^2$ . Let $T(x,y) = (2x, 3y)$ . Then
$$ T(\mathbf{e}_1) = (2,0), \quad T(\mathbf{e}_2) = (0,3). $$
So the matrix is
$$ [T] = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}. $$
Example 5.2.2. Rotation in the plane. The rotation transformation $R_\theta(x,y) = (x\cos\theta - y\sin\theta, ; x\sin\theta + y\cos\theta)$ has matrix
$$ [R_\theta] = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}. $$
Example 5.2.3. Projection onto the x-axis. The map $P(x,y) = (x,0)$ corresponds to
$$ [P] = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}. $$
Change of Basis
Matrix representations depend on the chosen basis. If $\mathcal{B}$ and $\mathcal{C}$ are bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ , then the matrix of $T: \mathbb{R}^n \to \mathbb{R}^m$ with respect to these bases is obtained by expressing $T(\mathbf{v}_j)$ in terms of $\mathcal{C}$ for each $\mathbf{v}_j \in \mathcal{B}$ . Changing bases corresponds to conjugating the matrix by the appropriate change-of-basis matrices.
Geometric Interpretation
Matrices are not just convenient notation-they are linear maps once a basis is fixed. Every rotation, reflection, projection, shear, or scaling corresponds to multiplying by a specific matrix. Thus, studying linear transformations reduces to studying their matrices.
Why this matters
Matrix representations make linear transformations computable. They connect abstract definitions to explicit calculations, enabling algorithms for solving systems, finding eigenvalues, and performing decompositions. Applications from graphics to machine learning depend on this translation.
Exercises 5.2
Find the matrix representation of $T:\mathbb{R}^2 \to \mathbb{R}^2$ , $T(x,y) = (x+y, x-y)$ . Determine the matrix of the linear transformation $T:\mathbb{R}^3 \to \mathbb{R}^2$ , $T(x,y,z) = (x+z, y-2z)$ . What matrix represents reflection across the line $y=x$ in $\mathbb{R}^2$ ? Show that the matrix of the identity transformation on $\mathbb{R}^n$ is $I_n$ . For the differentiation map $D:\mathbb{R}_2[x] \to \mathbb{R}_1[x]$ , where $\mathbb{R}_k[x]$ is the space of polynomials of degree at most $k$ , find the matrix of $D$ relative to the bases ${1,x,x^2}$ and ${1,x}$ .
5.3 Kernel and Image
To understand a linear transformation deeply, we must examine what it kills and what it produces. These ideas are captured by the kernel and the image, two fundamental subspaces associated with any linear map.
The Kernel
The kernel (or null space) of a linear transformation $T: V \to W$ is the set of all vectors in $V$ that map to the zero vector in $W$ :
$$ \ker(T) = { \mathbf{v} \in V \mid T(\mathbf{v}) = \mathbf{0} }. $$
The kernel is always a subspace of $V$ . It measures the degeneracy of the transformation-directions that collapse to nothing.
Example 5.3.1. Let $T:\mathbb{R}^3 \to \mathbb{R}^2$ be defined by
$$ T(x,y,z) = (x+y, y+z). $$
In matrix form,
$$ [T] = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix}. $$
To find the kernel, solve
$$ \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} x \ y \ z \end{bmatrix} = \begin{bmatrix} 0 \ 0 \end{bmatrix}. $$
This gives the equations $x + y = 0$ , $y + z = 0$ . Hence $x = -y, z = -y$ . The kernel is
$$ \ker(T) = { (-t, t, -t) \mid t \in \mathbb{R} }, $$
a line in $\mathbb{R}^3$ .
The Image
The image (or range) of a linear transformation $T: V \to W$ is the set of all outputs:
$$ \text{im}(T) = { T(\mathbf{v}) \mid \mathbf{v} \in V } \subseteq W. $$
Equivalently, it is the span of the columns of the representing matrix. The image is always a subspace of $W$ .
Example 5.3.2. For the same transformation as above,
$$ [T] = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix}, $$
the columns are $(1,0)$ , $(1,1)$ , and $(0,1)$ . Since $(1,1) = (1,0) + (0,1)$ , the image is
$$ \text{im}(T) = \text{span}{ (1,0), (0,1) } = \mathbb{R}^2. $$
Dimension Formula (Rank–Nullity Theorem)
For a linear transformation $T: V \to W$ with $V$ finite-dimensional,
$$ \dim(\ker(T)) + \dim(\text{im}(T)) = \dim(V). $$
This fundamental result connects the lost directions (kernel) with the achieved directions (image).
Geometric Interpretation
The kernel describes how the transformation flattens space (e.g., projecting a 3D object onto a plane).
The image describes the target subspace reached by the transformation.
The rank–nullity theorem quantifies the tradeoff: the more dimensions collapse, the fewer remain in the image.
Why this matters
Kernel and image capture the essence of a linear map. They classify transformations, explain when systems have unique or infinite solutions, and form the backbone of important results like the Rank–Nullity Theorem, diagonalization, and spectral theory.
Exercises 5.3
Find the kernel and image of $T:\mathbb{R}^2 \to \mathbb{R}^2$ , $T(x,y) = (x-y, x+y)$ . Let $A = \begin{bmatrix} 1 & 2 & 3 \ 0 & 1 & 4 \end{bmatrix}$. Find bases for $\ker(A)$ and $\text{im}(A)$ . For the projection map $P(x,y,z) = (x,y,0)$ , describe the kernel and image. Prove that $\ker(T)$ and $\text{im}(T)$ are always subspaces. Verify the Rank–Nullity Theorem for the transformation in Example 5.3.1.
5.4 Change of Basis
Linear transformations can look very different depending on the coordinate system we use. The process of rewriting vectors and transformations relative to a new basis is called a change of basis. This concept lies at the heart of diagonalization, orthogonalization, and many computational techniques.
Coordinate Change
Suppose $V$ is an $n$ -dimensional vector space, and let $\mathcal{B} = {\mathbf{v}_1, \dots, \mathbf{v}n}$ be a basis. Every vector $\mathbf{x} \in V$ has a coordinate vector $[\mathbf{x}]{\mathcal{B}} \in \mathbb{R}^n$.
If $P$ is the change-of-basis matrix from $\mathcal{B}$ to the standard basis, then
$$ \mathbf{x} = P [\mathbf{x}]_{\mathcal{B}}. $$
Equivalently,
$$ [\mathbf{x}]_{\mathcal{B}} = P^{-1} \mathbf{x}. $$
Here, $P$ has the basis vectors of $\mathcal{B}$ as its columns:
$$ P = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}. $$
Transformation of Matrices
Let $T: V \to V$ be a linear transformation. Suppose its matrix in the standard basis is $A$ . In the basis $\mathcal{B}$ , the representing matrix becomes
$$ [T]_{\mathcal{B}} = P^{-1} A P. $$
Thus, changing basis corresponds to a similarity transformation of the matrix.
Example
Example 5.4.1. Let $T:\mathbb{R}^2 \to \mathbb{R}^2$ be given by
$$ T(x,y) = (3x + y, x + y). $$
In the standard basis, its matrix is
$$ A = \begin{bmatrix} 3 & 1 \\ 1 & 1 \end{bmatrix}. $$
Now consider the basis $\mathcal{B} = { (1,1), (1,-1) }$ . The change-of-basis matrix is
$$ P = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}. $$
Then
$$ [T]_{\mathcal{B}} = P^{-1} A P. $$
Computing gives
$$ [T]_{\mathcal{B}} = \begin{bmatrix} 4 & 0 \\ 0 & 0 \end{bmatrix}. $$
In this new basis, the transformation is diagonal: one direction is scaled by 4, the other collapsed to 0.
Geometric Interpretation
Change of basis is like rotating or skewing your coordinate grid. The underlying transformation does not change, but its description in numbers becomes simpler or more complicated depending on the basis. Finding a basis that simplifies a transformation (often a diagonal basis) is a key theme in linear algebra.
Why this matters
Change of basis connects the abstract notion of similarity to practical computation. It is the tool that allows us to diagonalize matrices, compute eigenvalues, and simplify complex transformations. In applications, it corresponds to choosing a more natural coordinate system-whether in geometry, physics, or machine learning.
Exercises 5.4
Let $A = \begin{bmatrix} 2 & 1 \ 0 & 2 \end{bmatrix}$. Compute its representation in the basis ${(1,0),(1,1)}$ . Find the change-of-basis matrix from the standard basis of $\mathbb{R}^2$ to ${(2,1),(1,1)}$ . Prove that similar matrices (related by $P^{-1}AP$ ) represent the same linear transformation under different bases. Diagonalize the matrix $A = \begin{bmatrix} 1 & 0 \ 0 & -1 \end{bmatrix}$ in the basis ${(1,1),(1,-1)}$ . In $\mathbb{R}^3$ , let $\mathcal{B} = {(1,0,0),(1,1,0),(1,1,1)}$ . Construct the change-of-basis matrix $P$ and compute $P^{-1}$ .
Chapter 6. Determinants
6.1 Motivation and Geometric Meaning
Determinants are numerical values associated with square matrices. At first they may appear as a complicated formula, but their importance comes from what they measure: determinants encode scaling, orientation, and invertibility of linear transformations. They bridge algebra and geometry.
Determinants of $2 \times 2$ Matrices
For a $2 \times 2$ matrix
$$ A = \begin{bmatrix} a & b \ c & d \end{bmatrix}, $$
the determinant is defined as
$$ \det(A) = ad - bc. $$
Geometric meaning: If $A$ represents a linear transformation of the plane, then $|\det(A)|$ is the area scaling factor. For example, if $\det(A) = 2$ , areas of shapes are doubled. If $\det(A) = 0$ , the transformation collapses the plane to a line: all area is lost.
Determinants of $3 \times 3$ Matrices
For
$$ A = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}, $$
the determinant can be computed as
$$ \det(A) = a(ei - fh) - b(di - fg) + c(dh - eg). $$
Geometric meaning: In $\mathbb{R}^3$ , $|\det(A)|$ is the volume scaling factor. If $\det(A) < 0$ , orientation is reversed (a handedness flip), such as turning a right-handed coordinate system into a left-handed one.
General Case
For $A \in \mathbb{R}^{n \times n}$ , the determinant is a scalar that measures how the linear transformation given by $A$ scales n-dimensional volume.
If $\det(A) = 0$ : the transformation squashes space into a lower dimension, so $A$ is not invertible.
: the transformation squashes space into a lower dimension, so is not invertible. If $\det(A) > 0$ : volume is scaled by $\det(A)$ , orientation preserved.
: volume is scaled by , orientation preserved. If $\det(A) < 0$ : volume is scaled by $|\det(A)|$ , orientation reversed.
Visual Examples
Shear in $\mathbb{R}^2$ : $A = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix}$. Then $\det(A) = 1$ . The transformation slants the unit square into a parallelogram but preserves area. Projection in $\mathbb{R}^2$ : $A = \begin{bmatrix} 1 & 0 \ 0 & 0 \end{bmatrix}$. Then $\det(A) = 0$ . The unit square collapses into a line segment: area vanishes. Rotation in $\mathbb{R}^2$ : $R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \cos\theta \end{bmatrix}$. Then $\det(R_\theta) = 1$ . Rotations preserve area and orientation.
Why this matters
The determinant is not just a formula-it is a measure of transformation. It tells us whether a matrix is invertible, how it distorts space, and whether it flips orientation. This geometric insight makes the determinant indispensable in analysis, geometry, and applied mathematics.
Exercises 6.1
Compute the determinant of $\begin{bmatrix} 2 & 3 \ 1 & 4 \end{bmatrix}$. What area scaling factor does it represent? Find the determinant of the shear matrix $\begin{bmatrix} 1 & 2 \ 0 & 1 \end{bmatrix}$. What happens to the area of the unit square? For the $3 \times 3$ matrix $\begin{bmatrix} 1 & 0 & 0 \ 0 & 2 & 0 \ 0 & 0 & 3 \end{bmatrix}$, compute the determinant. How does it scale volume in $\mathbb{R}^3$ ? Show that any rotation matrix in $\mathbb{R}^2$ has determinant $1$ . Give an example of a $2 \times 2$ matrix with determinant $-1$ . What geometric action does it represent?
6.2 Properties of Determinants
Beyond their geometric meaning, determinants satisfy a collection of algebraic rules that make them powerful tools in linear algebra. These properties allow us to compute efficiently, test invertibility, and understand how determinants behave under matrix operations.
Basic Properties
Let $A, B \in \mathbb{R}^{n \times n}$ , and let $c \in \mathbb{R}$ . Then:
Identity: $$ \det(I_n) = 1. $$ Triangular matrices: If $A$ is upper or lower triangular, then $$ \det(A) = a_{11} a_{22} \cdots a_{nn}. $$ Row/column swap: Interchanging two rows (or columns) multiplies the determinant by $-1$ . Row/column scaling: Multiplying a row (or column) by a scalar $c$ multiplies the determinant by $c$ . Row/column addition: Adding a multiple of one row to another does not change the determinant. Transpose: $$ \det(A^T) = \det(A). $$ Multiplicativity: $$ \det(AB) = \det(A)\det(B). $$ Invertibility: $A$ is invertible if and only if $\det(A)
eq 0$ .
Example Computations
Example 6.2.1. For
$$ A = \begin{bmatrix} 2 & 0 & 0 \ 1 & 3 & 0 \ -1 & 4 & 5 \end{bmatrix}, $$
$A$ is lower triangular, so
$$ \det(A) = 2 \cdot 3 \cdot 5 = 30. $$
Example 6.2.2. Let
$$ B = \begin{bmatrix} 1 & 2 \ 3 & 4 \end{bmatrix}, \quad C = \begin{bmatrix} 0 & 1 \ 1 & 0 \end{bmatrix}. $$
Then
$$ \det(B) = 1\cdot 4 - 2\cdot 3 = -2, \quad \det(C) = -1. $$
Since $CB$ is obtained by swapping rows of $B$ ,
$$ \det(CB) = -\det(B) = 2. $$
This matches the multiplicativity rule: $\det(CB) = \det(C)\det(B) = (-1)(-2) = 2.$
Geometric Insights
Row swaps: flipping orientation of space.
Scaling a row: stretching space in one direction.
Row replacement: sliding hyperplanes without altering volume.
Multiplicativity: performing two transformations multiplies their scaling factors.
These properties make determinants both computationally manageable and geometrically interpretable.
Why this matters
Determinant properties connect computation with geometry and theory. They explain why Gaussian elimination works, why invertibility is equivalent to nonzero determinant, and why determinants naturally arise in areas like volume computation, eigenvalue theory, and differential equations.
Exercises 6.2
Compute the determinant of $$ A = \begin{bmatrix} 1 & 2 & 3 \ 0 & 1 & 4 \ 0 & 0 & 2 \end{bmatrix}. $$ Show that if two rows of a square matrix are identical, then its determinant is zero. Verify $\det(A^T) = \det(A)$ for $$ A = \begin{bmatrix} 2 & -1 \ 3 & 4 \end{bmatrix}. $$ If $A$ is invertible, prove that $$ \det(A^{-1}) = \frac{1}{\det(A)}. $$ Suppose $A$ is a $3\times 3$ matrix with $\det(A) = 5$ . What is $\det(2A)$ ?
6.3 Cofactor Expansion
While determinants of small matrices can be computed directly from formulas, larger matrices require a systematic method. The cofactor expansion (also known as Laplace expansion) provides a recursive way to compute determinants by breaking them into smaller ones.
Minors and Cofactors
For an $n \times n$ matrix $A = [a_{ij}]$ :
The minor $M_{ij}$ is the determinant of the $(n-1) \times (n-1)$ matrix obtained by deleting the $i$ -th row and $j$ -th column of $A$ .
is the determinant of the matrix obtained by deleting the -th row and -th column of . The cofactor $C_{ij}$ is defined by
$$ C_{ij} = (-1)^{i+j} M_{ij}. $$
The sign factor $(-1)^{i+j}$ alternates in a checkerboard pattern:
$$ \begin{bmatrix}
& - & + & - & \cdots \
& + & - & + & \cdots \
& - & + & - & \cdots \ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}. $$
Cofactor Expansion Formula
The determinant of $A$ can be computed by expanding along any row or any column:
$$ \det(A) = \sum_{j=1}^n a_{ij} C_{ij} \quad \text{(expansion along row (i))}, $$
$$ \det(A) = \sum_{i=1}^n a_{ij} C_{ij} \quad \text{(expansion along column (j))}. $$
Example
Example 6.3.1. Compute
$$ A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 6 \end{bmatrix}. $$
Expand along the first row:
$$ \det(A) = 1 \cdot C_{11} + 2 \cdot C_{12} + 3 \cdot C_{13}. $$
For $C_{11}$ : $M_{11} = \det \begin{bmatrix} 4 & 5 \ 0 & 6 \end{bmatrix} = 24$, so $C_{11} = (+1)(24) = 24$ .
: $M_{11} = \det \begin{bmatrix} 4 & 5 \ 0 & 6 \end{bmatrix} = 24$, so . For $C_{12}$ : $M_{12} = \det \begin{bmatrix} 0 & 5 \ 1 & 6 \end{bmatrix} = 0 - 5 = -5$, so $C_{12} = (-1)(-5) = 5$ .
: $M_{12} = \det \begin{bmatrix} 0 & 5 \ 1 & 6 \end{bmatrix} = 0 - 5 = -5$, so . For $C_{13}$ : $M_{13} = \det \begin{bmatrix} 0 & 4 \ 1 & 0 \end{bmatrix} = 0 - 4 = -4$, so $C_{13} = (+1)(-4) = -4$ .
Thus,
$$ \det(A) = 1(24) + 2(5) + 3(-4) = 24 + 10 - 12 = 22. $$
Properties of Cofactor Expansion
Expansion along any row or column yields the same result. The cofactor expansion provides a recursive definition of determinant: a determinant of size $n$ is expressed in terms of determinants of size $n-1$ . Cofactors are fundamental in constructing the adjugate matrix, which gives a formula for inverses:
$$ A^{-1} = \frac{1}{\det(A)} , \text{adj}(A), \quad \text{where adj}(A) = [C_{ji}]. $$
Geometric Interpretation
Cofactor expansion breaks down the determinant into contributions from sub-volumes defined by fixing one row or column at a time. Each cofactor measures how that row/column influences the overall volume scaling.
Why this matters
Cofactor expansion generalizes the small-matrix formulas and provides a conceptual definition of determinants. While not the most efficient way to compute determinants for large matrices, it is essential for theory, proofs, and connections to adjugates, Cramer’s rule, and classical geometry.
Exercises 6.3
Compute the determinant of $$ \begin{bmatrix} 2 & 0 & 1 \ 3 & -1 & 4 \ 1 & 2 & 0 \end{bmatrix} $$ by cofactor expansion along the first column. Verify that expanding along the second row of Example 6.3.1 gives the same determinant. Prove that expansion along any row gives the same value. Show that if a row of a matrix is zero, then its determinant is zero. Use cofactor expansion to prove that $\det(A) = \det(A^T)$ .
6.4 Applications (Volume, Invertibility Test)
Determinants are not merely algebraic curiosities; they have concrete geometric and computational uses. Two of the most important applications are measuring volumes and testing invertibility of matrices.
Determinants as Volume Scalers
Given vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n \in \mathbb{R}^n$ , arrange them as columns of a matrix:
$$ A = \begin{bmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \\ | & | & & | \end{bmatrix}. $$
Then $|\det(A)|$ equals the volume of the parallelepiped spanned by these vectors.
In $\mathbb{R}^2$ , $|\det(A)|$ gives the area of the parallelogram spanned by $\mathbf{v}_1, \mathbf{v}_2$ .
, gives the area of the parallelogram spanned by . In $\mathbb{R}^3$ , $|\det(A)|$ gives the volume of the parallelepiped spanned by $\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3$ .
, gives the volume of the parallelepiped spanned by . In higher dimensions, it generalizes to $n$ -dimensional volume (hypervolume).
Example 6.4.1. Let
$$ \mathbf{v}_1 = (1,0,0), \quad \mathbf{v}_2 = (1,1,0), \quad \mathbf{v}_3 = (1,1,1). $$
Then
$$ A = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}, \quad \det(A) = 1. $$
So the parallelepiped has volume $1$ , even though the vectors are not orthogonal.
Invertibility Test
A square matrix $A$ is invertible if and only if $\det(A)
eq 0$ .
If $\det(A) = 0$ : the transformation collapses space into a lower dimension (area/volume is zero). No inverse exists.
: the transformation collapses space into a lower dimension (area/volume is zero). No inverse exists. If $\det(A)
eq 0$ : the transformation scales volume by $|\det(A)|$ , and is reversible.
Example 6.4.2. The matrix
$$ B = \begin{bmatrix} 2 & 4 \ 1 & 2 \end{bmatrix} $$
has determinant $\det(B) = 2 \cdot 2 - 4 \cdot 1 = 0$ . Thus, $B$ is not invertible. Geometrically, the two column vectors are collinear, spanning only a line in $\mathbb{R}^2$ .
Cramer’s Rule
Determinants also provide an explicit formula for solving systems of linear equations when the matrix is invertible. For $A\mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{n \times n}$ :
$$ x_i = \frac{\det(A_i)}{\det(A)}, $$
where $A_i$ is obtained by replacing the $i$ -th column of $A$ with $\mathbf{b}$ . While inefficient computationally, Cramer’s rule highlights the determinant’s role in solutions and uniqueness.
Orientation
The sign of $\det(A)$ indicates whether a transformation preserves or reverses orientation. For example, a reflection in the plane has determinant $-1$ , flipping handedness.
Why this matters
Determinants condense key information: they measure scaling, test invertibility, and track orientation. These insights are indispensable in geometry (areas and volumes), analysis (Jacobian determinants in calculus), and computation ( solving systems and checking singularity).
Exercises 6.4
Compute the area of the parallelogram spanned by $(2,1)$ and $(1,3)$ . Find the volume of the parallelepiped spanned by $(1,0,0), (1,1,0), (1,1,1)$ . Determine whether the matrix $\begin{bmatrix} 1 & 2 \ 3 & 6 \end{bmatrix}$ is invertible. Justify using determinants. Use Cramer’s rule to solve $$ \begin{cases} x + y = 3, \ 2x - y = 0. \end{cases} $$ Explain geometrically why a determinant of zero implies no inverse exists.
Chapter 7. Inner Product Spaces
7.1 Inner Products and Norms
To extend the geometric ideas of length, distance, and angle beyond $\mathbb{R}^2$ and $\mathbb{R}^3$ , we introduce inner products. Inner products provide a way of measuring similarity between vectors, while norms derived from them measure length. These concepts are the foundation of geometry inside vector spaces.
Inner Product
An inner product on a real vector space $V$ is a function
$$ \langle \cdot, \cdot \rangle : V \times V \to \mathbb{R} $$
that assigns to each pair of vectors $(\mathbf{u}, \mathbf{v})$ a real number, subject to the following properties:
Symmetry: $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle.$ Linearity in the first argument: $\langle a\mathbf{u} + b\mathbf{w}, \mathbf{v} \rangle = a \langle \mathbf{u}, \mathbf{v} \rangle + b \langle \mathbf{w}, \mathbf{v} \rangle.$ Positive-definiteness: $\langle \mathbf{v}, \mathbf{v} \rangle \geq 0$ , and equality holds if and only if $\mathbf{v} = \mathbf{0}$ .
The standard inner product on $\mathbb{R}^n$ is the dot product:
$$ \langle \mathbf{u}, \mathbf{v} \rangle = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n. $$
Norms
The norm of a vector is its length, defined in terms of the inner product:
$$ |\mathbf{v}| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}. $$
For the dot product in $\mathbb{R}^n$ :
$$ |(x_1, x_2, \dots, x_n)| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}. $$
Angles Between Vectors
The inner product allows us to define the angle $\theta$ between two nonzero vectors $\mathbf{u}, \mathbf{v}$ by
$$ \cos \theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{|\mathbf{u}| , |\mathbf{v}|}. $$
Thus, two vectors are orthogonal if $\langle \mathbf{u}, \mathbf{v} \rangle = 0$ .
Examples
Example 7.1.1. In $\mathbb{R}^2$ , with $\mathbf{u} = (1,2)$ , $\mathbf{v} = (3,4)$ :
$$ \langle \mathbf{u}, \mathbf{v} \rangle = 1\cdot 3 + 2\cdot 4 = 11. $$
$$ |\mathbf{u}| = \sqrt{1^2 + 2^2} = \sqrt{5}, \quad |\mathbf{v}| = \sqrt{3^2 + 4^2} = 5. $$
So,
$$ \cos \theta = \frac{11}{\sqrt{5}\cdot 5}. $$
Example 7.1.2. In the function space $C[0,1]$ , the inner product
$$ \langle f, g \rangle = \int_0^1 f(x) g(x), dx $$
defines a length
$$ |f| = \sqrt{\int_0^1 f(x)^2 dx}. $$
This generalizes geometry to infinite-dimensional spaces.
Geometric Interpretation
Inner product: measures similarity between vectors.
Norm: length of a vector.
Angle: measure of alignment between two directions.
These concepts unify algebraic operations with geometric intuition.
Why this matters
Inner products and norms allow us to extend geometry into abstract vector spaces. They form the basis of orthogonality, projections, Fourier series, least squares approximation, and many applications in physics and machine learning.
Exercises 7.1
Compute $\langle (2,-1,3), (1,4,0) \rangle$ . Then find the angle between them. Show that $|(x,y)| = \sqrt{x^2+y^2}$ satisfies the properties of a norm. In $\mathbb{R}^3$ , verify that $(1,1,0)$ and $(1,-1,0)$ are orthogonal. In $C[0,1]$ , compute $\langle f,g \rangle$ for $f(x)=x$ , $g(x)=1$ . Prove the Cauchy–Schwarz inequality: $$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq |\mathbf{u}| , |\mathbf{v}|. $$
7.2 Orthogonal Projections
One of the most useful applications of inner products is the notion of orthogonal projection. Projection allows us to approximate a vector by another lying in a subspace, minimizing error in the sense of distance. This idea underpins geometry, statistics, and numerical analysis.
Projection onto a Line
Let $\mathbf{u} \in \mathbb{R}^n$ be a nonzero vector. The line spanned by $\mathbf{u}$ is
$$ L = { c\mathbf{u} \mid c \in \mathbb{R} }. $$
Given a vector $\mathbf{v}$ , the projection of $\mathbf{v}$ onto $\mathbf{u}$ is the vector in $L$ closest to $\mathbf{v}$ . Geometrically, it is the shadow of $\mathbf{v}$ on the line.
The formula is
$$ \text{proj}_{\mathbf{u}}(\mathbf{v}) = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle} , \mathbf{u}. $$
The error vector $\mathbf{v} - \text{proj}_{\mathbf{u}}(\mathbf{v})$ is orthogonal to $\mathbf{u}$ .
Example 7.2.1
Let $\mathbf{u} = (1,2)$ , $\mathbf{v} = (3,1)$ .
$$ \langle \mathbf{v}, \mathbf{u} \rangle = 3\cdot 1 + 1\cdot 2 = 5, \quad \langle \mathbf{u}, \mathbf{u} \rangle = 1^2 + 2^2 = 5. $$
So
$$ \text{proj}_{\mathbf{u}}(\mathbf{v}) = \frac{5}{5}(1,2) = (1,2). $$
The error vector is $(3,1) - (1,2) = (2,-1)$ , which is orthogonal to $(1,2)$ .
Projection onto a Subspace
Suppose $W \subseteq \mathbb{R}^n$ is a subspace with orthonormal basis ${ \mathbf{w}_1, \dots, \mathbf{w}_k }$ . The projection of a vector $\mathbf{v}$ onto $W$ is
$$ \text{proj}_{W}(\mathbf{v}) = \langle \mathbf{v}, \mathbf{w}_1 \rangle \mathbf{w}_1 + \cdots + \langle \mathbf{v}, \mathbf{w}_k \rangle \mathbf{w}_k. $$
This is the unique vector in $W$ closest to $\mathbf{v}$ . The difference $\mathbf{v} - \text{proj}_{W}(\mathbf{v})$ is orthogonal to all of $W$ .
Least Squares Approximation
Orthogonal projection explains the method of least squares. To solve an overdetermined system $A\mathbf{x} \approx \mathbf{b}$ , we seek the $\mathbf{x}$ that makes $A\mathbf{x}$ the projection of $\mathbf{b}$ onto the column space of $A$ . This gives the normal equations
$$ A^T A \mathbf{x} = A^T \mathbf{b}. $$
Thus, least squares is just projection in disguise.
Geometric Interpretation
Projection finds the closest point in a subspace to a given vector.
It minimizes distance (error) in the sense of Euclidean norm.
Orthogonality ensures the error vector points directly away from the subspace.
Why this matters
Orthogonal projection is central in both pure and applied mathematics. It underlies the geometry of subspaces, the theory of Fourier series, regression in statistics, and approximation methods in numerical linear algebra. Whenever we fit data with a simpler model, projection is at work.
Exercises 7.2
Compute the projection of $(2,3)$ onto the vector $(1,1)$ . Show that $\mathbf{v} - \text{proj}_{\mathbf{u}}(\mathbf{v})$ is orthogonal to $\mathbf{u}$ . Let $W = \text{span}{(1,0,0), (0,1,0)} \subseteq \mathbb{R}^3$ . Find the projection of $(1,2,3)$ onto $W$ . Explain why least squares fitting corresponds to projection onto the column space of $A$ . Prove that projection onto a subspace $W$ is unique: there is exactly one closest vector in $W$ to a given $\mathbf{v}$ .
7.3 Gram–Schmidt Process
The Gram–Schmidt process is a systematic way to turn any linearly independent set of vectors into an orthonormal basis. This is especially useful because orthonormal bases simplify computations: inner products become simple coordinate comparisons, and projections take clean forms.
The Idea
Given a linearly independent set of vectors ${\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n}$ in an inner product space, we want to construct an orthonormal set ${\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n}$ that spans the same subspace.
We proceed step by step:
Start with $\mathbf{v}_1$ , normalize it to get $\mathbf{u}_1$ . Subtract from $\mathbf{v}_2$ its projection onto $\mathbf{u}_1$ , leaving a vector orthogonal to $\mathbf{u}_1$ . Normalize to get $\mathbf{u}_2$ . For each $\mathbf{v}_k$ , subtract projections onto all previously constructed $\mathbf{u}1, \dots, \mathbf{u}{k-1}$, then normalize.
The Algorithm
For $k = 1, 2, \dots, n$ :
$$ \mathbf{w}_k = \mathbf{v}k - \sum{j=1}^{k-1} \langle \mathbf{v}_k, \mathbf{u}_j \rangle \mathbf{u}_j, $$
$$ \mathbf{u}_k = \frac{\mathbf{w}_k}{|\mathbf{w}_k|}. $$
The result ${\mathbf{u}_1, \dots, \mathbf{u}_n}$ is an orthonormal basis of the span of the original vectors.
Example 7.3.1
Take $\mathbf{v}_1 = (1,1,0), \ \mathbf{v}_2 = (1,0,1), \ \mathbf{v}_3 = (0,1,1)$ in $\mathbb{R}^3$ .
Normalize $\mathbf{v}_1$ :
$$ \mathbf{u}_1 = \frac{1}{\sqrt{2}}(1,1,0). $$
Subtract projection of $\mathbf{v}_2$ on $\mathbf{u}_1$ :
$$ \mathbf{w}_2 = \mathbf{v}_2 - \langle \mathbf{v}_2,\mathbf{u}_1 \rangle \mathbf{u}_1. $$
$$ \langle \mathbf{v}_2,\mathbf{u}_1 \rangle = \frac{1}{\sqrt{2}}(1\cdot 1 + 0\cdot 1 + 1\cdot 0) = \tfrac{1}{\sqrt{2}}. $$
So
$$ \mathbf{w}_2 = (1,0,1) - \tfrac{1}{\sqrt{2}}\cdot \tfrac{1}{\sqrt{2}}(1,1,0) = (1,0,1) - \tfrac{1}{2}(1,1,0) = \left(\tfrac{1}{2}, -\tfrac{1}{2}, 1\right). $$
Normalize:
$$ \mathbf{u}_2 = \frac{1}{\sqrt{\tfrac{1}{4}+\tfrac{1}{4}+1}} \left(\tfrac{1}{2}, -\tfrac{1}{2}, 1\right) = \frac{1}{\sqrt{\tfrac{3}{2}}}\left(\tfrac{1}{2}, -\tfrac{1}{2}, 1\right). $$
Subtract projections from $\mathbf{v}_3$ :
$$ \mathbf{w}_3 = \mathbf{v}_3 - \langle \mathbf{v}_3,\mathbf{u}_1 \rangle \mathbf{u}_1 - \langle \mathbf{v}_3,\mathbf{u}_2 \rangle \mathbf{u}_2. $$
After computing, normalize to obtain $\mathbf{u}_3$ .
The result is an orthonormal basis of the span of ${\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3}$ .
Geometric Interpretation
Gram–Schmidt is like straightening out a set of vectors: you start with the original directions and adjust each new vector to be perpendicular to all previous ones. Then you scale to unit length. The process ensures orthogonality while preserving the span.
Why this matters
Orthonormal bases simplify inner products, projections, and computations in general. They make coordinate systems easier to work with and are crucial in numerical methods, QR decomposition, Fourier analysis, and statistics (orthogonal polynomials, principal component analysis).
Exercises 7.3
Apply Gram–Schmidt to $(1,0), (1,1)$ in $\mathbb{R}^2$ . Orthogonalize $(1,1,1), (1,0,1)$ in $\mathbb{R}^3$ . Prove that each step of Gram–Schmidt yields a vector orthogonal to all previous ones. Show that Gram–Schmidt preserves the span of the original vectors. Explain how Gram–Schmidt leads to the QR decomposition of a matrix.
7.4 Orthonormal Bases
An orthonormal basis is a basis of a vector space in which all vectors are both orthogonal to each other and have unit length. Such bases are the most convenient possible coordinate systems: computations involving inner products, projections, and norms become exceptionally simple.
Definition
A set of vectors ${\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n}$ in an inner product space $V$ is called an orthonormal basis if
$\langle \mathbf{u}_i, \mathbf{u}_j \rangle = 0$ whenever $i
eq j$ (orthogonality), $|\mathbf{u}_i| = 1$ for all $i$ (normalization), The set spans $V$ .
Examples
Example 7.4.1. In $\mathbb{R}^2$ , the standard basis
$$ \mathbf{e}_1 = (1,0), \quad \mathbf{e}_2 = (0,1) $$
is orthonormal under the dot product.
Example 7.4.2. In $\mathbb{R}^3$ , the standard basis
$$ \mathbf{e}_1 = (1,0,0), \quad \mathbf{e}_2 = (0,1,0), \quad \mathbf{e}_3 = (0,0,1) $$
is orthonormal.
Example 7.4.3. Fourier basis on functions:
$$ {1, \cos x, \sin x, \cos 2x, \sin 2x, \dots} $$
is an orthogonal set in the space of square-integrable functions on $[-\pi,\pi]$ with inner product
$$ \langle f,g \rangle = \int_{-\pi}^{\pi} f(x) g(x), dx. $$
After normalization, it becomes an orthonormal basis.
Properties
Coordinate simplicity: If ${\mathbf{u}_1,\dots,\mathbf{u}_n}$ is an orthonormal basis of $V$ , then any vector $\mathbf{v}\in V$ has coordinates $$ [\mathbf{v}] = \begin{bmatrix} \langle \mathbf{v}, \mathbf{u}_1 \rangle \ \vdots \ \langle \mathbf{v}, \mathbf{u}_n \rangle \end{bmatrix}. $$ That is, coordinates are just inner products. Parseval’s identity: For any $\mathbf{v} \in V$ , $$ |\mathbf{v}|^2 = \sum_{i=1}^n |\langle \mathbf{v}, \mathbf{u}_i \rangle|^2. $$ Projections: The orthogonal projection onto the span of ${\mathbf{u}_1,\dots,\mathbf{u}_k}$ is $$ \text{proj}(\mathbf{v}) = \sum_{i=1}^k \langle \mathbf{v}, \mathbf{u}_i \rangle \mathbf{u}_i. $$
Constructing Orthonormal Bases
Start with any linearly independent set, then apply the Gram–Schmidt process to obtain an orthonormal set spanning the same subspace.
In practice, orthonormal bases are often chosen for numerical stability and simplicity of computation.
Geometric Interpretation
An orthonormal basis is like a perfectly aligned and equally scaled coordinate system. Distances and angles are computed directly using coordinates without correction factors. They are the ideal rulers of linear algebra.
Why this matters
Orthonormal bases simplify every aspect of linear algebra: solving systems, computing projections, expanding functions, diagonalizing symmetric matrices, and working with Fourier series. In data science, principal component analysis produces orthonormal directions capturing maximum variance.
Exercises 7.4
Verify that $(1/\sqrt{2})(1,1)$ and $(1/\sqrt{2})(1,-1)$ form an orthonormal basis of $\mathbb{R}^2$ . Express $(3,4)$ in terms of the orthonormal basis ${(1/\sqrt{2})(1,1), (1/\sqrt{2})(1,-1)}$ . Prove Parseval’s identity for $\mathbb{R}^n$ with the dot product. Find an orthonormal basis for the plane $x+y+z=0$ in $\mathbb{R}^3$ . Explain why orthonormal bases are numerically more stable than arbitrary bases in computations.
Chapter 8. Eigenvalues and eigenvectors
8.1 Definitions and Intuition
The concepts of eigenvalues and eigenvectors reveal the most fundamental behavior of linear transformations. They identify the special directions in which a transformation acts by simple stretching or compressing, without rotation or distortion.
Definition
Let $T: V \to V$ be a linear transformation on a vector space $V$ . A nonzero vector $\mathbf{v} \in V$ is called an eigenvector of $T$ if
$$ T(\mathbf{v}) = \lambda \mathbf{v} $$
for some scalar $\lambda \in \mathbb{R}$ (or $\mathbb{C}$ ). The scalar $\lambda$ is the eigenvalue corresponding to $\mathbf{v}$ .
Equivalently, if $A$ is the matrix of $T$ , then eigenvalues and eigenvectors satisfy
$$ A\mathbf{v} = \lambda \mathbf{v}. $$
Basic Examples
Example 8.1.1. Let
$$ A = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}. $$
Then
$$ A(1,0)^T = 2(1,0)^T, \quad A(0,1)^T = 3(0,1)^T. $$
So $(1,0)$ is an eigenvector with eigenvalue $2$ , and $(0,1)$ is an eigenvector with eigenvalue $3$ .
Example 8.1.2. Rotation matrix in $\mathbb{R}^2$ :
$$ R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \cos\theta \end{bmatrix}. $$
If $\theta
eq 0, \pi$ , $R_\theta$ has no real eigenvalues: every vector is rotated, not scaled. Over $\mathbb{C}$ , however, it has eigenvalues $e^{i\theta}, e^{-i\theta}$ .
Algebraic Formulation
Eigenvalues arise from solving the characteristic equation:
$$ \det(A - \lambda I) = 0. $$
This polynomial in $\lambda$ is the characteristic polynomial. Its roots are the eigenvalues.
Geometric Intuition
Eigenvectors are directions that remain unchanged in orientation under a transformation; only their length is scaled.
Eigenvalues tell us the scaling factor along those directions.
If a matrix has many independent eigenvectors, it can often be simplified (diagonalized) by changing basis.
Applications in Geometry and Science
Stretching along principal axes of an ellipse (quadratic forms).
Stable directions of dynamical systems.
Principal components in statistics and machine learning.
Quantum mechanics, where observables correspond to operators with eigenvalues.
Why this matters
Eigenvalues and eigenvectors are a bridge between algebra and geometry. They provide a lens for understanding linear transformations in their simplest form. Nearly every application of linear algebra-differential equations, statistics, physics, computer science-relies on eigen-analysis.
Exercises 8.1
Find the eigenvalues and eigenvectors of $\begin{bmatrix} 4 & 0 \ 0 & -1 \end{bmatrix}$. Show that every scalar multiple of an eigenvector is again an eigenvector for the same eigenvalue. Verify that the rotation matrix $R_\theta$ has no real eigenvalues unless $\theta = 0$ or $\pi$ . Compute the characteristic polynomial of $\begin{bmatrix} 1 & 2 \ 2 & 1 \end{bmatrix}$. Explain geometrically what eigenvectors and eigenvalues represent for the shear matrix $\begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix}$.
8.2 Diagonalization
A central goal in linear algebra is to simplify the action of a matrix by choosing a good basis. Diagonalization is the process of rewriting a matrix so that it acts by simple scaling along independent directions. This makes computations such as powers, exponentials, and solving differential equations far easier.
Definition
A square matrix $A \in \mathbb{R}^{n \times n}$ is diagonalizable if there exists an invertible matrix $P$ such that
$$ P^{-1} A P = D, $$
where $D$ is a diagonal matrix.
The diagonal entries of $D$ are eigenvalues of $A$ , and the columns of $P$ are the corresponding eigenvectors.
When is a Matrix Diagonalizable?
A matrix is diagonalizable if it has $n$ linearly independent eigenvectors.
linearly independent eigenvectors. Equivalently, the sum of the dimensions of its eigenspaces equals $n$ .
. Symmetric matrices (over $\mathbb{R}$ ) are always diagonalizable, with an orthonormal basis of eigenvectors.
Example 8.2.1
Let
$$ A = \begin{bmatrix} 4 & 1 \ 0 & 2 \end{bmatrix}. $$
Characteristic polynomial:
$$ \det(A - \lambda I) = (4-\lambda)(2-\lambda). $$
So eigenvalues are $\lambda_1 = 4$ , $\lambda_2 = 2$ .
Eigenvectors:
For $\lambda = 4$ , solve $(A-4I)\mathbf{v}=0$ : $\begin{bmatrix} 0 & 1 \ 0 & -2 \end{bmatrix}\mathbf{v} = 0$, giving $\mathbf{v}_1 = (1,0)$ .
, solve : $\begin{bmatrix} 0 & 1 \ 0 & -2 \end{bmatrix}\mathbf{v} = 0$, giving . For $\lambda = 2$ : $(A-2I)\mathbf{v}=0$ , giving $\mathbf{v}_2 = (1,-2)$ .
Construct $P = \begin{bmatrix} 1 & 1 \ 0 & -2 \end{bmatrix}$. Then
$$ P^{-1} A P = \begin{bmatrix} 4 & 0 \ 0 & 2 \end{bmatrix}. $$
Thus, $A$ is diagonalizable.
Why Diagonalize?
Computing powers: If $A = P D P^{-1}$ , then $$ A^k = P D^k P^{-1}. $$ Since $D$ is diagonal, $D^k$ is easy to compute.
Matrix exponentials: $e^A = P e^D P^{-1}$ , useful in solving differential equations.
Understanding geometry: Diagonalization reveals the directions along which a transformation stretches or compresses space independently.
Non-Diagonalizable Example
Not all matrices can be diagonalized.
$$ A = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix} $$
has only one eigenvalue $\lambda = 1$ , with eigenspace dimension 1. Since $n=2$ but we only have 1 independent eigenvector, $A$ is not diagonalizable.
Geometric Interpretation
Diagonalization means we have found a basis of eigenvectors. In this basis, the matrix acts by simple scaling along each coordinate axis. It transforms complicated motion into independent 1D motions.
Why this matters
Diagonalization is a cornerstone of linear algebra. It simplifies computation, reveals structure, and is the starting point for the spectral theorem, Jordan form, and many applications in physics, engineering, and data science.
Exercises 8.2
Diagonalize $$ A = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}. $$ Determine whether $$ A = \begin{bmatrix} 1 & 1 \ 0 & 1 \end{bmatrix} $$ is diagonalizable. Why or why not? Find $A^5$ for $$ A = \begin{bmatrix} 4 & 1 \ 0 & 2 \end{bmatrix} $$ using diagonalization. Show that any $n \times n$ matrix with $n$ distinct eigenvalues is diagonalizable. Explain why real symmetric matrices are always diagonalizable.
8.3 Characteristic Polynomials
The key to finding eigenvalues is the characteristic polynomial of a matrix. This polynomial encodes the values of $\lambda$ for which the matrix $A - \lambda I$ fails to be invertible.
Definition
For an $n \times n$ matrix $A$ , the characteristic polynomial is
$$ p_A(\lambda) = \det(A - \lambda I). $$
The roots of $p_A(\lambda)$ are the eigenvalues of $A$ .
Examples
Example 8.3.1. Let
$$ A = \begin{bmatrix} 2 & 1 \ 1 & 2 \end{bmatrix}. $$
Then
$$ p_A(\lambda) = \det!\begin{bmatrix} 2-\lambda & 1 \ 1 & 2-\lambda \end{bmatrix} = (2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3. $$
Thus eigenvalues are $\lambda = 1, 3$ .
Example 8.3.2. For
$$ A = \begin{bmatrix} 0 & -1 \ 1 & 0 \end{bmatrix} $$
(rotation by 90°),
$$ p_A(\lambda) = \det!\begin{bmatrix} -\lambda & -1 \ 1 & -\lambda \end{bmatrix} = \lambda^2 + 1. $$
Eigenvalues are $\lambda = \pm i$ . No real eigenvalues exist, consistent with pure rotation.
Example 8.3.3. For a triangular matrix
$$ A = \begin{bmatrix} 2 & 1 & 0 \ 0 & 3 & 5 \ 0 & 0 & 4 \end{bmatrix}, $$
the determinant is simply the product of diagonal entries minus $\lambda$ :
$$ p_A(\lambda) = (2-\lambda)(3-\lambda)(4-\lambda). $$
So eigenvalues are $2, 3, 4$ .
Properties
The characteristic polynomial of an $n \times n$ matrix has degree $n$ . The sum of the eigenvalues (counted with multiplicity) equals the trace of $A$ : $$ \text{tr}(A) = \lambda_1 + \cdots + \lambda_n. $$ The product of the eigenvalues equals the determinant of $A$ : $$ \det(A) = \lambda_1 \cdots \lambda_n. $$ Similar matrices have the same characteristic polyn