New Linear Algebra
tags: [ education , mathematics ]
New way to teach linear algebra.
\[ \begin{align*} Ax = \begin{bmatrix} 1 & 4 & 5 \\ 3 & 2 & 5 \\ 2 & 1 & 3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix} x_1 + \begin{bmatrix} 4 \\ 2 \\ 1 \end{bmatrix} x_2 + \begin{bmatrix} 5 \\ 5 \\ 3 \end{bmatrix} x_3 \end{align*} \]
- \(Ax\) is basically a linear combination of the columns
- column space \(C(A)\) is the space spanned by the (3) columns, though since the first two columns add up to the second, it’s just two vectors (rank 2).
\(A = CR\)
\[ \begin{align*} A = CR = \begin{bmatrix} 1 & 4 \\ 3 & 2 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} \end{align*} \]
- getting independent columns, which is the basis for the column space.
- clearly, you can always do this
- it’s just that sometimes it’s not interesting (i.e. if it’s already a basis)
- \(R\) is the basis for the row space, also rank 2
- number of independent columns = \(r\) = number of independent rows
- \(\operatorname{rank}(C(A)) = \operatorname{rank}(R(A))\)
- \(Ax = 0\) has one solution, and there are \(n-r\) independent solutions to \(Ax = 0\)
- \(R\) turns out to be the row reduced echolon form of \(A\)
\(Ax = 0\)
\[ \begin{align*} \begin{bmatrix} \text{row } 1 \\ \vdots \\ \text{row } m \end{bmatrix} \begin{bmatrix} \\ x \\ \, \end{bmatrix} = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix} \end{align*} \]
- 0 on the right means that the rows must all be orthogonal to \(x\)
- i.e. every \(x\) in the nullspace \(N(A)\) must be orthogonal to the row space.
- if you take transposes you get the equivalent statement for column space
Figure 1: Big Picture Linear Algebra
\(A = BC\)
\[ \begin{align*} A = BC = b_1 c_1^* + b_2 c_2^* + \ldots \end{align*} \]
- here, \(b_i\) are the column vectors of \(B\), while \(c_i^*\) are the row vectors of \(C\).
- you can think of matrix also as column by row multiplication
- it’s more like building blocks there, adding up rank 1 matrices
\(A = LU\)
- you can rewrite \(A\) as a product of lower-triangular and upper-triangular matrices.
- that’s what you’re doing when you’re solving simultaneous equations by elimination
\(Q^\top Q\)
- \(Q\) with orthogonal(normal) column vectors, such that \(Q^\top Q = I\)
- not necessarily true that \(QQ^\top = I\), if not square
- \(QQ^\top\) is like projection:
- \(QQ^\top QQ^\top = QQ^\top\)
- if square, then \(QQ^\top = I\) and \(Q^\top = Q^{-1}\)
- length is preserved by \(Q\)
\(A=QR\)
- Gram-Schmidt: orthogonalize the columns of \(A\).
- assuming full rank of \(C(A)\)
- \(Q^\top A = Q^\top Q R = R\)
- Major application: Least Squares
- want to minimize \(\norm{b - Ax}^2 = \norm{e}^2\)
- normal equation for minimizing solution (best \(\hat{x}\)): \(A^\top e = 0\)
- if \(A = QR\), then you get \(R \hat{x} = Q^\top b\)
Figure 2: Least Square
\(S = Q \Lambda Q^\top\)
- symmetric matrix \(S=S^\top\) have orthogonal eigenvectors
- how to show \(x^\top y = 0\)
- \(Sx=\lambda x \implies x^\top S^\top = \lambda x^\top \implies x^\top S y = \lambda x^\top y\)
- \(Sy=\alpha y \implies x^\top S y = \alpha x^\top y\)
- hence, \(\lambda x^\top y = \alpha x^\top y\). but \(\lambda \neq \alpha\), since eigenvalues are distinct
- hence, \(x^\top y = 0\)
- things are not nice when things aren’t square
- but \(A^\top A\) is always square, symmetric, PSD
- same holds for \(AA^\top\)
- same eigenvalues
- \(x \to Ax\) to convert eigenvectors
- from definition of eigenvalues: \(AX = X \Lambda\)
- if you don’t have symmetry, then you can’t just hit it with \(X^\top\) and get things to cancel
\(A = U \Sigma V^\top\)
- \(U^\top U = I, V^\top V = I\)
- \(AV = U \Sigma\) (like the eigen-decomposition, but not quite)
- \(Av_i = \sigma_i u_i\)
- \(U, V\) are rotations, \(\Sigma\) stretch.
- how to choose orthonormal \(v_i\) in the row space of \(A\)
- \(v_i\) are eigenvectors of \(A^\top A\) (which are orthonormal, from above)
- \(A^\top A v_i = \lambda_i v_i = \sigma_i^2 v_i\)
- so then \(A^\top A V = \Sigma^2 V\)
- how to choose \(u_i\)
- \(u_i = \frac{1}{\sigma_i} A v_i\)
- check orthonormal: \(u_i^\top u_j\)
- combining, \(A^\top A v_i = \sigma_i^2 v_i \implies A^\top u_i = \sigma_i v_i\)
Figure 3: Singular Value Decomposition