Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Singular Value Decomposition - Why and How?

Royi
June 24, 2011

Singular Value Decomposition - Why and How?

A Presentation about Singular Value Decomposition (SVD). It covers the following:

* Singular Value Theorem.
* Applications
* Order Reduction.
* Solving Linear Equation System (Least Squares).
* Total Least Squares.
* Principal Component Analysis (PCA).

Royi

June 24, 2011
Tweet

Other Decks in Science

Transcript

  1. Definitions and Notations Singular Value Decomposition Theorem Applications 1 Definitions

    and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  2. Definitions and Notations Singular Value Decomposition Theorem Applications Notations 1

    Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  3. Definitions and Notations Singular Value Decomposition Theorem Applications Notations Capital

    letter stands for a Matrix A ∈ Cmxn, A ∈ Rmxn Small letter stands for a column Vector a ∈ Cmx1, a ∈ Rmx1 Referring a Row of a Matrix Ai − The i − th Row of a Matrix Referring a Column of a Matrix Aj − The j − th Column of a Matrix
  4. Definitions and Notations Singular Value Decomposition Theorem Applications Definitions 1

    Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  5. Definitions and Notations Singular Value Decomposition Theorem Applications Definitions Unless

    written otherwise, The Complex Field is the default Conjugate Operator A∗ Transpose Operator AT : AT ij = Aji Complex Conjugate Transpose Operator AH : AH ij = Aji ∗ Range Space and Null Space of Operator Let L : X → Y be an Operator (Linear or otherwise). The Range Space R (L) ⊆ Y is R (L) = {y = Lx : x ∈ X} The Null Space N (L) ⊆ X is N (L) = {x ∈ X : Lx = 0}
  6. Definitions and Notations Singular Value Decomposition Theorem Applications Introduction 1

    Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  7. Definitions and Notations Singular Value Decomposition Theorem Applications Introduction Each

    Linear Operator A : Cn → Cm defines spaces as follows The following properties hold R (A) ⊥N AH , R AH ⊥N (A) rank (A) = dim (R (A)) = dim R AH = rank AH
  8. Definitions and Notations Singular Value Decomposition Theorem Applications Introduction The

    action of linear operator A ∈ Cmxn The following properties hold rank (A) = rank AAH = rank AHA = rank AH R (A) = R AAH , R AH = R AHA
  9. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Theorem

    1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  10. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Theorem

    Theorem SVD Theorem: Every Matrix A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Corollary (i) The columns of U are the Eigenvectors of AAH (Left Eigenvectors). AAH = UΣV H UΣV H H = UΣV HV ΣHUH = UΣΣHUH The columns of V are the Eigenvectors of AHA (Right Eigenvectors). AHA = UΣV H H UΣV H = V ΣHUHUΣV H = V ΣH ΣV H
  11. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Theorem

    Theorem SVD Theorem: Every Matrix A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Corollary (ii) The p singular values on the diagonal of Σ are the square roots of the non zero eigenvalues of both AAH and AHA. The SVD is unique up to permutations of (ui , σi , vi ) as long as σi = σj ⇔ i = j. If the "Algebraic Multiplicity" of a certain eigenvalue of AHA / AAH is larger than 1, Then, there’s a freedom of choosing the the vectors which spans the the null space AAH − λI / AHA − λI.
  12. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  13. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem A = UΣV H In order to prove the SVD Theorem 2 propositions should be used. Proposition I ∀A ∈ Cmxn AHA or AAH are Hermitian Matrix. Proof. Cij = AH i Aj = AH i Aj HH = AjH AH i H H = AH jAi = Cji H = Cji ∗
  14. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Proposition II - Spectral Decomposition ∀A ∈ Cnxn : Aij = Aji ∗ (Hermitian Matrix) Can be diagonalized using Unitary Matrix U ∈ Cnxn s.t. UHAU = Λ. Proof. The Spectral Decomposition is a result of few properties of Hermitian Matrices: For Hermitian Matrices the Eigenvectors of distinct Eigenvalues are Orthogonal. Schur’s Lemma ∀A ∈ Cnxn ∃ U ∈ Cnxn Unitary s.t. UHAU = T Where T ∈ Cnxn is Upper Triangular Matrix. When A has n distinct Eigenvalues the Proposition II is immediate. Otherwise it can be shown that if A is Hermitian, T is Hermitian and since it is Upper Triangular, it must be Diagonal Matrix.
  15. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Let AHAV = Vdiag (λ1, λ2, . . . , λn) be the Spectral Decomposition of AHA. Where the columns of V = [v1, v2, . . . , vn] are Eigenvectors and λ1, λ2, . . . , λr > 0, λr+1 = λr+2 = . . . = λn = 0, Where r ≤ p. For 1 ≤ i ≤ r, Let ui = Avi √ λi
  16. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued Notice that ui , uj = δi−j The set {ui , i = 1, 2, . . . , r} can be extended using the Graham-Schmidt procedure to form an Orthonormal basis for Cm. Let U = [u1, u2, . . . , um] Then the set of ui are the Eigenvectors for AAH.
  17. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued This is clear for the non zero Eigenvalues of AAH. For the zero Eigenvalues, The Eigenvectors must come from the Null Space of AAH. Since the Eigenvectors with zero Eigenvalues are, By construction, Orthogonal to the Eigenvectors with non zero Eigenvalues that are in the Range of AAH, Hence must be in the Null Space of AAH.
  18. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued Examining the elements of UHAV . For i ≤ r the (i, j) element of UHAV is ui HAvj = 1 √ λi vi HAHAvj = λj √ λi vi Hvj = λj δij For i > r We get AAHui = 0 Thus AHui ∈ N (A) and also AHui ∈ R AH as a Linear Combination of the columns of AH. Yet R AH ⊥N (A) Hence AHui = 0.
  19. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued. Since AHui = 0 we get ui HAvj = vj HAHui = 0. Thus UHAV = Σ, Where Σ is diagonal (Main Diagonal).
  20. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Notice that AHA and AAH share the same non zero eigenvalues (Could be proved independently from the SVD). Let AAHui = σi 2ui for i = 1, 2, . . . , m. By Spectral Theorem: U = [u1, u2, . . . , um] , U ∈ Cmxm, UUH = UHU = Im Thus AHui = σi for i = 1, 2, . . . , m.
  21. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued Let AHAvi = ˆ σi 2vi for i = 1, 2, . . . , m. By Spectral Theorem: V = [v1, v2, . . . , vm] , V ∈ Cnxn, VV H = V HV = In Utilizing the above for the non zero ˆ σi 2: AAHui = σi 2ui ⇒ AHAAHui zi = σi 2AHui zi Meaning zi and σi 2 are eigenvectors and eigenvalues of AHA.
  22. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued Examining zi yields: zj Hzi = uj HAAHui = σi 2uj Hui ⇒ zi = ui σi ⇒ vi = zi σi = AHui σi Consider the following n equations for i = 1, 2, . . . , m: Avi = AAH ui σi (or zero) = σi ui (or zero)
  23. Definitions and Notations Singular Value Decomposition Theorem Applications Proof of

    the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued These equations can be written as: AV = UΣ ⇔ A = UΣV H Where U and V as defined above, Σ is an mxn matrix with the top left nxn block in diagonal form with σi on the diagonal and the bottom are zeros.
  24. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  25. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    It is often convenient to break the matrices in the SVD into two parts, corresponding to the nonzero singular values and the zero singular values. Let Σ = Σ1 Σ2 where Σ1 = diag (σ1, σ2, . . . , σr ) ∈ Rrxr and σ1 σ2 . . . σr , Σ2 = diag (σr+1, σr+2, . . . , σp) = diag (0, 0, . . . , 0) ∈ R(m−r)x(n−r) Then the SVD can be written as A = U1 U2 Σ1 Σ2 V1 H V2 H = U1 Σ1V1 H Where U1 ∈ Cmxr , U2 ∈ Cmx(m−r), V1 ∈ Cnxr and V2 ∈ Cnx(n−r).
  26. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    The SVD can also be written as A = r ∑ i=1 σi ui vi H The SVD can also be used to compute 2 matrix norms: Hibert-Schmidt / Frobenius Norm A 2 F = ∑ i,j Aij 2 = r ∑ i=1 σi 2 l2 Norm A 2 = sup x=0 Ax x = max (λ (A)) = σ1 Which implies argmax x=0 Ax x = v1, argmax x=0 xHA x = u1
  27. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    The Rank of a matrix is the number of nonzero singular values along the main diagonal of Σ. Using the notation used before rank (A) = r The SVD is numerically stable way of computing the rank of a matrix. The range (Column Space) of a matrix is R (A) = {b ∈ Cm : b = Ax} = b ∈ Cm : b = UΣV Hx = {b ∈ Cm : b = UΣy} = {b ∈ Cm : b = U1 ˜ y} = span (U1) The range of a matrix is spanned by the orthogonal set of vectors in U1, the first r columns of U.
  28. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    Generally, the other fundamental spaces of a matrix A can also be determined from the SVD: R (A) = span (U1) = R AAH N (A) = span (V2) R AH = span (V1) = R AHA N AH = span (U2) The SVD thus provides an explicit orthogonal basis and a computable dimensionality for each of the fundamental spaces of a matrix.
  29. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

    Since the SVD is a decomposition of a given matrix into 2 Unitary matrices and a diagonal matrix, all matrices could be described as a rotation, scaling and another rotation. This intuition is a result of the properties of unitary matrices which basically rotate the multiplied matrix. This property is farther examined when dealing Linear Equations.
  30. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Example

    1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  31. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Example

    Finding the SVD of a matrix (Numerically) using MATLAB command - [U S V] = svd(A). Let A = 1 2 3 6 5 4 Then A = UΣV H Where U = −0.355 −0.934 −0.934 0.355 Σ = 9.362 0 0 0 1.831 0 V =   −0.637 −0.653 0.408 −0.575 −0.050 −0.8165 −0.513 −0.754 0.408  
  32. Definitions and Notations Singular Value Decomposition Theorem Applications SVD Example

    Let A be a Diagonal Matrix A = 2 0 0 −4 = 0 1 −1 0 4 0 0 2 0 1 1 0 In this case, the U and V matrices just shuffle the columns around and change the signs to make the singular values positive. Let A be a Square Symmetric Matrix A =   5 6 2 6 1 4 2 4 7   = UΣV H Where U = V =   0.592 −0.616 0.518 0.526 −0.191 0.828 0.610 0.763 −0.211   , Σ =   12.391 0 0 0 4.383 0 0 0 3.774   In this case, the SVD is the regular Eigen Decomposition.
  33. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  34. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    The SVD of a matrix can be used to determine how near (In the sense of l2-norm) the matrix is to a matrix of a lower rank. It can also be used to find the nearest matrix of a given lower rank. Theorem Let A be an mxn matrix with rank(A) = r and let A = UΣV H. Let k < r and let Ak = k ∑ i=1 σi ui vi H = UΣkV H where Σk = diag (σ1, σ2, . . . , σk ) Then A − Ak 2 = σk+1, and Ak is the nearest matrix of rank k to A (In the sense of l2-norm / Frobenius norm): min rank(B)=k A − B 2 = A − Ak 2
  35. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    Proof. Since A − Ak = Udiag (0, 0, . . . , 0, σk+1, . . . , σr , 0, . . . , 0) V H it follows that A − Ak 2 = σk+1. The second part of the proof is a "Proof by Inequality". By Definition of the matrix norm, for any unit vector z the following holds: A − B 2 2 ≥ (A − B) z 2 2 Let B be a rank - k matrix of size mxn. Then there exist vectors {x1, x2, . . . , xn−k } that span N (B) where xi ∈ Cn. Consider the vectors from the matrix V of the SVD, {v1, v2, . . . , vk+1} where vi ∈ Cn.
  36. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    Proof. Continued. The intersection, span (x1, . . . , xn−k ) ∩ span (v1, . . . , vk+1) ⊆ Rn, cannot be zero since there are total of n + 1 vectors. Let z be a vector from this intersection, normalized s.t. z 2 = 1. Then: A − B 2 2 ≥ (A − B) z 2 2 = Az 2 2 Since z ∈ span (v1, v2, . . . , vk+1), Az = k+1 ∑ i=1 σi vi Hz ui Now A − B 2 2 ≥ Az 2 2 = k+1 ∑ i=1 σ2 i vi Hz 2 ≥ σ2 k+1 Lower bound is achieved by B = k ∑ i=1 σi ui vi H, with z = vk+1.
  37. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    Applications of Order Reduction: Noise Reduction Basic assumption - Noise is mainly pronounced in the small singular values. Noiseless Matrix Noisy Matrix - Std 1 Noisy Matrix - Std 6 Noisy Matrix - Std 11
  38. Definitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

    Ground Truth Added Noise Std 6 Reconstruction using 140 Singular Values
  39. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  40. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Consider the solution of the equation Ax = b. If b ∈ R (A) there is at least one solution: If dim (N (A)) = 0 there is only one unique solution, xr ∈ R AH , s.t. Axr = b. If dim (N (A)) ≥ 1 , The columns of A are not independent, there are infinite solutions. Any vector of the form ˆ x = xr + xn where xr ∈ R AH is the solution from the previous case and xn ∈ N (A) is a solution s.t. A (xr + xn) = b. Which solution should be chosen? Usually the solution with the minimum norm, xr . If b / ∈ R (A) there is no solution. Usually, the following vector is searched, ˆ x s.t. Aˆ x − b 2 is brought to minimum.
  41. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Assuming ˆ x = min x Ax − b 2 . By definition ˆ b = Aˆ x ∈ R (A). Meaning, the search is for ˆ b s.t. ˆ b − b 2 is minimized.
  42. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System According to the Projection Theorem, only one vector ˆ b exists s.t. ˆ b − b 2 is minimized. This vector is the projection of b on R (A). ˆ b = ProjR(A) (b) = AHb Moreover, ˆ x = min x Ax − b 2 ⇔ AHA ˆ x = AHb Intuitively, the procedure is as following: Project b onto the Column Space R (A), namely, ˆ b = ProjR(A) (b) = AHb. Project ˆ x onto the Row Space R AH , namely, ProjR(AH ) (ˆ x) = Ax. Project the previous result Ax onto the Column Space R (A), namely, ProjR(A) (Aˆ x) = AH (Aˆ x).
  43. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System The equation AHA ˆ x = AHb is called the Normal Equations. If the columns of A are independent then AHA is invertible and ˆ x could be calculated as the following: ˆ x = AHA −1 AHb This is the Least Squares solution using the Pseudo Inverse of A: A† = AHA −1 AH
  44. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Yet, if the columns of A are linearly dependent the Pseudo Inverse of A can’t be calculated directly. If A has dependent columns, then the null space of A is not trivial and there is no unique solution. The problem becomes selecting one solution out of the infinite number of possible solutions. As mentioned, commonly accepted approach is to select the solution with the smallest norm (Length). This problem could be solved using the SVD and definition of the generalized Pseudo Inverse of a matrix.
  45. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Definition The Pseudo Inverse of a matrix A = UΣV H, denoted A† is given by A† = V Σ†UH Where Σ† is obtained by transposing Σ and inverting all non zero entries. Proposition III Let A = UΣV H and x† = A†b = V Σ†UHb. Then AHAx† = AHb. Namely, using the solution given by the Pseudo Inverse matrix calculated using the SVD holds the Normal Equations. This definition of Pseudo Inverse exists for any matrix.
  46. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Proposition III Let A = UΣV H and x† = A†b = V Σ†UHb. Then AHAx† = AHb. proof It’s sufficient to show that AH Ax† − b = 0. Ax† − b = UΣV H V Σ†UHb − b = UΣΣ†UH − I b = U ΣΣ† − I UH b
  47. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Proof. Continued. Thus, AH Ax† − b = V ΣHUHU ΣΣ† − I UHb = V ΣH ΣΣ† − I UHb One should observe that ΣH = ΣH rxr 0rx(m−r) 0(n−r)xr 0(m−r)x(m−r) Where Σr is r by r submatrix of non zero diagonal entries in Σ and ΣΣ† − I = 0rxr 0rx(m−r) 0(n−r)xr −I(m−r)x(m−r) Hence the multiplication yields the zero matrix.
  48. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Proposition IV The vector ˆ x = A†b is the shortest Least Squares solution to Ax = b, namely, ˆ x 2 = min { x 2 : Ax − b 2 is minimal} proof Using the fact both U and V are Unitary min x 2 min x Ax − b 2 = min x 2 min x UΣV Hx − b 2 = min V H x 2 min x ΣV Hx − UHb 2 = min y 2 min x Σy − UHb 2
  49. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Proof. Observing at min y 2 min x Σy − UHb 2 . Since Σ is diagonal (Main diagonal to the least) there’s only one Least Squares solution, ˆ y = Σ†UHb. Thus, ˆ x = V ˆ y = V Σ†UHb will attain the minimum norm.
  50. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System As written previously, any solution which holds the Normal Equations is the Least Squares solution. ˆ x = min x Ax − b 2 ⇔ AHA ˆ x = AHb Yet, one should observe ˆ x ∈ R AH , namely, the solution lies in the Row Space of A. Hence, its norm is minimal among all solutions. In short, the Pseudo Inverse simultaneously minimizes the norm of the error as well as the norm of the solution.
  51. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Example I Examining the following Linear System: Ax = b Where, A =   8 10 3 30 9 6 6 18 1 1 10 3   , x =     x1 x2 x3 x4     =     1 2 3 6     , b =   217 147 51   Obviously, A−1 can’t be calculated. Moreover, since rank (A) = 3 neither AHA −1 exists. Yet the Pseudo Inverse using the SVD does exists.
  52. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Using the SVD approach A = UΣV H. Hence, A† = V Σ†UH. Using MATLAB to calculate the SVD yields: Σ =   39.378 0 0 0 0 10.002 0 0 0 0 3.203 0   → Σ† =     0.025 0 0 0 0.1 0 0 0 0.312 0 0 0     Calculating ˆ x yields: ˆ x = V Σ†UHb =     1 2 3 6     = x The SVD canceled the 4th column which is dependent on the 2nd column of A. Since b ∈ R (A) the exact solution could be calculated.
  53. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Example II In this case A =   5 0 0 0 0 2 0 0 0 0 0 0   , x =     x1 x2 x3 x4     =     1 2 3 6     , b =   5 4 3   Obviously, b / ∈ R (A). Neither A−1 nor AHA −1 exist. Using the SVD Pseudo Inverse: ˆ x = V Σ†UHb =     1 2 0 0    
  54. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Examining the solution using the SVD. First, Since rank (A) = 2 its Column Space is spanned by the first 2 columns of U. Calculating the projection of b onto the Column Space of A is given by ˆ b = ProjR(A) (b) = 2 ∑ i=1 Ui H bUi =   5 4 0  . Now given the updated Linear System Aˆ x = ˆ b which has infinite number of solutions. One could calculate that N (A) = span         0 0 1 0     ,     0 0 0 1         . Hence ˆ x =      ˆ b1 A1,1 ˆ b2 A2,2 0 0      +     s     0 0 1 0     + t     0 0 0 1         = ˆ xr + ˆ xn where s, t ∈ R.
  55. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System The target is the solution with the minimum norm. Since ˆ xr ⊥ˆ xn the norm of this solution is ˆ x 2 = ˆ xr 2 + ˆ xn 2 The minimum norm solution is obtained by taking xn = 0. This results in the Pseudo Inverse solution as above.
  56. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Numerically Sensitive Problems Systems of equations that are poorly conditioned are sensitive to small change in values. Since, practically speaking, there are always inaccuracies in measured data, the solution to these equations may be almost meaningless. The SVD can help with the solution of ill-conditioned equations by identifying the direction of sensitivity and discarding that portion of the problem. The procedure will be illustrated by the following example.
  57. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Example III Examining the following system of equations Ax = b 1 + 3 1 − 3 3 − 3 + x1 x2 = b1 b2 The SVD of A is A = 1 √ 20 1 3 3 −1 2 √ 5 2 √ 5 1 1 1 −1 From which the exact inverse of A is A−1 = √ 20 1 1 1 −1 1 2 √ 5 1 2 √ 5 1 3 3 −1 = 1 20 1 + 3 3 − 1 1 − 3 3 + 1 Easily, one can convince himself that for small the matrix A−1 has large entries which makes x = A−1b unstable.
  58. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Observe that the entry 1 2 √ 5 multiplies the column 1 −1 . This is the sensitive direction. As b changes slightly, the solution changes in a direction mostly along the sensitive direction. If is small, σ2 = 2 √ 5 may be set to zero to approximate A. A ≈ 1 √ 20 1 3 3 −1 2 √ 5 0 1 1 1 −1 The Pseudo Inverse is A† = √ 20 1 1 1 −1 1 2 √ 5 0 1 3 3 −1 = 1 20 1 3 1 3 In this case the multiplier of the sensitive direction vector is zero, no motion in the sensitive direction occurs. Any Least Squares solution to the equation Ax = b is of the form ˆ x = A†b so that ˆ x = c 1 1 for c ∈ R, meaning perpendicular to the sensitive direction.
  59. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System As this example illustrates, the SVD identifies the stable and unstable directions of the problem and, by zeroing small singular values, eliminates the unstable directions. The SVD could be used to both illustrate poor conditioning and provide a cure for the ailment. For the equation Ax = b with solution x = A−1b, writing the solution using the SVD: x = A−1b = UΣV H −1 = r ∑ i=1 vi uH i b σi If the singular value σi is small, then a small change in b or a small change in either U or V may be amplified into a large change in the solution x. A small singular value responds to a matrix which is nearly singular and thus more difficult to invert accurately.
  60. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System Another point of view, considering the equation Ax0 = b0 ⇒ x0 = A−1b0 Let b = b0 + δb where δb is the error or noise, etc. Therefore Ax = b0 + δb ⇒ x = A−1b0 + A−1 δb = x0 + δx Investigating how small or large is this error in the answer for a given amount of error. Note that δx = A−1 δb ⇒ δx ≤ A−1 δb Or since A−1 = σmax A−1 = 1 σmin(A) the following holds δx ≤ δb σmin (A)
  61. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System However recalling that x0 = A−1b0 and therefore x0 ≥ σmin A−1 b0 = b0 σmax (A) Combining the equations yields δx x0 ≤ δb b0 σmax (A) σmin (A) The last fraction, σmax (A) σmin(A) , is called ’The Condition Number of A’. This number is indicative of the magnification of error in linear equation of interest. In most problems, a matrix with very large condition number is called ill conditioned and will result in severe numerical difficulties.
  62. Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear

    Equation System The solution to those numerical difficulties using the SVD is basically rank reduction: 1 Compute the SVD of A. 2 Examine the singular values of A and zero out any that are "small" to obtain a new approximate Σ matrix. 3 Compute the solution by ˆ x = V Σ†UHb. Determining which singular values are "small" is problem dependent and requires some judgment.
  63. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  64. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares In the classic Least Squares problems, the solution minimizing Ax − b 2 is sought after. The hidden assumption is that matrix A is correct, any error in the problem is in b. The Least Squares problem finds a vector ˆ x s.t. Aˆ x − b 2 = min which is accomplished by finding some perturbation r of the right hand side of minimum norm Ax = b + r s.t. (b + r) ∈ R (A). In the Total Least Squares problem, both the right and left side of the equation assumed to have errors. The solution of the perturbed equation (A + E) x = b + r is sought s.t. (b + r) ∈ R (A + E) and the norm of the perturbations is minimized.
  65. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares Intuitively, The right hand side is "bent" toward the left hand side while the left hand side is "bent" toward the right hand side.
  66. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares Let A be an mxn matrix. To find the solution to the TLS problem one may observe the homogeneous form A + E|b + r x −1 = 0 → A|b + E|r x −1 = 0 Let C = A|b ∈ Cmx(n+1) and let ∆ = E|r be the perturbation of the data. In order for the homogeneous form to have solution the vector x −1 must lie in the Null Space of C + ∆ and in order for the solution not to be trivial, the perturbation ∆ must be such that C + ∆ is rank deficient.
  67. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares Analyzing the TLS problem using the SVD. We bring (A + E) x = (b + r) into the form A + E|b + r x −1 = 0 Let A + E|b + r = UΣV H be the SVD of the above form. If σn+1 = 0 then rank A + E|b + r = n + 1 which means the R A + E|b + r = Rn+1, hence there’s no nonzero vector in the orthogonal complement of the Row Space hence the set of equations is incompatible. To obtain solution the rank of A + E|b + r must be reduced to n. As shown before the best approximation of rank n in both Frobenius and l2 norm is given by the SVD ˆ A|ˆ b = U ˆ ΣV H, ˆ Σ = diag (σ1, σ2, . . . , σn, 0)
  68. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares The minimal TLS correction is given by σn+1 = min rank ˆ A|ˆ b =n A|b − ˆ A|ˆ b F Attained for E|r = σn+1un+1vH n+1 Note that the TLS correction matrix has rank one. It is clear that the approximate set ˆ A|ˆ b x −1 = 0 is compatible and the solution is given by the only vector, vn+1, that belongs to N ˆ A|ˆ b . The TLS solution is obtained by scaling vn+1 until its last component equals to −1, or x −1 = −1 Vn+1,n+1 vn+1
  69. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares For simplicity it assumed that Vn+1,n+1 = 0 and σn > σn+1 hence the solution exists and it is unique. Otherwise, the solution might not exists or isn’t unique (Any superposition of few columns of V ). For complete analysis of the existence and uniqueness of the solution see []. Basic algorithm of the TLS would be: Given Ax ≈ b, where A ∈ Cmxn, b ∈ Cm the TLS solution could be obtained by Compute the SVD of A|b = UΣV H. If Vn+1,n+1 = 0 the TLS solution would be xTLS = −1 Vn+1,n+1 vn+1 (1 : n)
  70. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares The geometric properties of the solution could be described as following, the TLS solution minimizes the distance between the vector b to the plane defined by the solution xTLS. Let C = UΣV H From the definition of the l2 Norm of a matrix Cv 2 v 2 ≥ σn+1 Where v 2 = 0. Equality holds if and only if v ∈ Sc where Sc = span {vi } and vi are the columns of V which satisfy uH i Cvi = σn+1. The TLS problem amounts to finding vector x s.t. A|b x −1 2 x −1 2 = σn+1
  71. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares By squaring everywhere min x A|b x −1 2 2 x −1 2 2 = min x m ∑ i=1 AH i x − bi 2 xHx + 1 The quantity |AH i x−bi |2 xH x+1 is the square of the distance from the point AH i b ∈ Cn+1 to the nearest point on the hyperplane P defined by P = a b |a ∈ Cn, b ∈ C, b = xHa So the TLS problem amounts to finding the closest hyperplane to the set of points AH 1 b1 , AH 2 b2 , . . . , AH m bm .
  72. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares The minimum distance property can be shown as following. Let P be the plane orthogonal to the normal vector n ∈ Rn+1 s.t. P = r ∈ Cn+1 : rHn = 0 and let n have the following form n = x −1 . Let p = AH m bm be a point in Cn+1. Finding a point, q ∈ Cn+1 which belongs to the plane P and is closest to the point p is a constrained optimization problem, minimize p − q subject to nHq = 0. The minimization function J (q) = p − q 2 + 2λnHq = pHp − 2pHq + 2λnHq + qHq = (q − p + λn)2 (q − p + λn) + 2λpHn − λ2nHn This is clearly minimized when q = p − λn.
  73. Definitions and Notations Singular Value Decomposition Theorem Applications Total Least

    Squares Determining λ by the constrain nHq = nHp − λnHn = 0 → λ = nHp nHn Inserting results into the "Minimization Function" yields J (q) = 2λpHn − λ2nHn = 2nHpppHn nHn − nHnpHpnHn nHnnHn = nHp 2 nHn = xHAm − bm 2 xHx + 1 Alternative solution using the "Projection Theorem". The distance from the point p to the plane P can be found by finding the length of the projection of p onto n, which yields d2 min (p, P) = p, n 2 n 2 = xH, −1 Am H bm xHx + 1
  74. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis
  75. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis Principal Component Analysis (PCA) is the mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called Principal Components. The number of Principal Components is less than or equal to the number of original variables. This transformation is defined in such way that the first component has a variance as high as possible (That’s, accounts for as much of the variability in data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (Uncorrelated with) the preceding components.
  76. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (Called the first Principal Component), the second greatest variance on the second coordinate and so on. Assuming given a collection of data of columns vectors a1, a2, . . . , am ∈ Rn. The projection of the data onto a subspace U ∈ Rr , r ≤ m which is is spanned by the orthogonal basis u1, u2, . . . , ur is given by ai = fi1u1 + fi2u2 + . . . + fir ur , i = 1 : m for some coefficients fij. Note that fij = aH i uj, the projection of ai along the direction of uj. By the Projection Theorem This projection is the closest in the l2 − norm sense to the data given by ai .
  77. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis The search is after the orthogonal basis u1, u2, . . . , ur . Formulation the constraint of maximization of the variance along the direction of u1 yields max w =1 m ∑ i=1 aH i w 2 = AHw 2 = AHw H AHw = wHAAHw Using the SVD of A = UΣV H, Then AAH = UΣΣHUH. Observing, wHAAHw wHw = UHw H ΣΣH UHw (UHw)H (UHw) Noticing that there are only r non zero entries in Σ by the properties the SVD. Defining x = UHw yields wHAAHw wHw = σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m
  78. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis Now we have max w=0 wHAAHw wHw = max w=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m Assuming σ1 ≥ σ2 ≥ . . . ≥ σr . Then max w=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = σ2 1 = λ1 Which the largest eigenvalue of A. The vector x which makes the maximum is x1 = 1 and xi = 0 for i = 2 : m. Which corresponds to w = Ux = u1. The first Principal Component is indeed achieved by the first eigenvector u1 of AAH.
  79. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis Calculating the second Principal Component under the constraint being orthogonal to the first and maximizing the projection max w =1,wH u1=0 m ∑ i=1 aH i w 2 = max w=0,wH u1=0 wH AAH w wHw Using the definition from above yields max x=0,xH UH u1=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = max x=0,x1=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = σ2 2 = λ2 Which is the second largest eigenvalue of AAH. The vector x which makes the maximum is x2 = 1 and xi = 0 for i = 1, 3 : m. This corresponds to w = Ux = u2, The second Eigenvector, u2, of AAH.
  80. Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component

    Analysis Continuing this pattern, ui is the ith Principal Component. The set of orthogonal vectors which spans the subspace the data is projected to and maximizes the variance of the data is the first r vectors which consists the orthogonal matrix from the SVD, U. Observing the SVD yields the the result immediately A = UΣV H → Y = UHA = ΣV H Observing the scatter matrix of Y CY = YY H = UHA UHA H = UHAAHU = UHCX U Since the matrix U is the Eigenvectors matrix of CX = XXH by the Diagonalization Theorem CY is diagonal. Another look yields YY H = ΣV H ΣV H H = ΣV HV ΣH = ΣΣH = diag(σ2 1 , σ2 2 , . . . , σ2 r ) Namely, the Scatter matrix, hence the Covariance Matrix of Y is diagonal. Moreover, The constraint on the variance holds.
  81. Definitions and Notations Singular Value Decomposition Theorem Applications The SVD

    is a decomposition which can be applied on any matrix. The SVD exposes fundamental properties of a linear operator such as the fundamental spaces, Frobenius Norm and l2 Norm. The SVD can be utilized in many applications such as solving linear systems (Least Squares, Total Least Squares) and order reduction (Compression, Noise Reduction, Principal Component Analysis). To Be Continued Regulating Linear Equations System.
  82. Appendix For Further Reading A. Author. Handbook of Everything. Some

    Press, 1990. S. Someone. On this and that. Journal on This and That. 2(1):50–100, 2000. R. Avital. On this and that. Journal on This and That. 2(1):50–100, 2000.