Singular Value Decomposition - Why and How?

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Singular Value
Decomposition Why and How Royi Avital June, 2011

Definitions and Notations Singular Value Decomposition Theorem Applications 1 Definitions
and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Definitions and Notations Singular Value Decomposition Theorem Applications Notations 1
Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Notations Capital
letter stands for a Matrix A ∈ Cmxn, A ∈ Rmxn Small letter stands for a column Vector a ∈ Cmx1, a ∈ Rmx1 Referring a Row of a Matrix Ai − The i − th Row of a Matrix Referring a Column of a Matrix Aj − The j − th Column of a Matrix

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Deﬁnitions 1

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Deﬁnitions Unless
written otherwise, The Complex Field is the default Conjugate Operator A∗ Transpose Operator AT : AT ij = Aji Complex Conjugate Transpose Operator AH : AH ij = Aji ∗ Range Space and Null Space of Operator Let L : X → Y be an Operator (Linear or otherwise). The Range Space R (L) ⊆ Y is R (L) = {y = Lx : x ∈ X} The Null Space N (L) ⊆ X is N (L) = {x ∈ X : Lx = 0}

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Introduction 1

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Introduction Each
Linear Operator A : Cn → Cm deﬁnes spaces as follows The following properties hold R (A) ⊥N AH , R AH ⊥N (A) rank (A) = dim (R (A)) = dim R AH = rank AH

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Introduction The
action of linear operator A ∈ Cmxn The following properties hold rank (A) = rank AAH = rank AHA = rank AH R (A) = R AAH , R AH = R AHA

Definitions and Notations Singular Value Decomposition Theorem Applications SVD Theorem
1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Theorem SVD Theorem: Every Matrix A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Corollary (i) The columns of U are the Eigenvectors of AAH (Left Eigenvectors). AAH = UΣV H UΣV H H = UΣV HV ΣHUH = UΣΣHUH The columns of V are the Eigenvectors of AHA (Right Eigenvectors). AHA = UΣV H H UΣV H = V ΣHUHUΣV H = V ΣH ΣV H

Theorem SVD Theorem: Every Matrix A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Corollary (ii) The p singular values on the diagonal of Σ are the square roots of the non zero eigenvalues of both AAH and AHA. The SVD is unique up to permutations of (ui , σi , vi ) as long as σi = σj ⇔ i = j. If the "Algebraic Multiplicity" of a certain eigenvalue of AHA / AAH is larger than 1, Then, there’s a freedom of choosing the the vectors which spans the the null space AAH − λI / AHA − λI.

Definitions and Notations Singular Value Decomposition Theorem Applications Proof of
the SVD Theorem 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

the SVD Theorem Theorem A = UΣV H In order to prove the SVD Theorem 2 propositions should be used. Proposition I ∀A ∈ Cmxn AHA or AAH are Hermitian Matrix. Proof. Cij = AH i Aj = AH i Aj HH = AjH AH i H H = AH jAi = Cji H = Cji ∗

the SVD Theorem Proposition II - Spectral Decomposition ∀A ∈ Cnxn : Aij = Aji ∗ (Hermitian Matrix) Can be diagonalized using Unitary Matrix U ∈ Cnxn s.t. UHAU = Λ. Proof. The Spectral Decomposition is a result of few properties of Hermitian Matrices: For Hermitian Matrices the Eigenvectors of distinct Eigenvalues are Orthogonal. Schur’s Lemma ∀A ∈ Cnxn ∃ U ∈ Cnxn Unitary s.t. UHAU = T Where T ∈ Cnxn is Upper Triangular Matrix. When A has n distinct Eigenvalues the Proposition II is immediate. Otherwise it can be shown that if A is Hermitian, T is Hermitian and since it is Upper Triangular, it must be Diagonal Matrix.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Let AHAV = Vdiag (λ1, λ2, . . . , λn) be the Spectral Decomposition of AHA. Where the columns of V = [v1, v2, . . . , vn] are Eigenvectors and λ1, λ2, . . . , λr > 0, λr+1 = λr+2 = . . . = λn = 0, Where r ≤ p. For 1 ≤ i ≤ r, Let ui = Avi √ λi

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued Notice that ui , uj = δi−j The set {ui , i = 1, 2, . . . , r} can be extended using the Graham-Schmidt procedure to form an Orthonormal basis for Cm. Let U = [u1, u2, . . . , um] Then the set of ui are the Eigenvectors for AAH.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued This is clear for the non zero Eigenvalues of AAH. For the zero Eigenvalues, The Eigenvectors must come from the Null Space of AAH. Since the Eigenvectors with zero Eigenvalues are, By construction, Orthogonal to the Eigenvectors with non zero Eigenvalues that are in the Range of AAH, Hence must be in the Null Space of AAH.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued Examining the elements of UHAV . For i ≤ r the (i, j) element of UHAV is ui HAvj = 1 √ λi vi HAHAvj = λj √ λi vi Hvj = λj δij For i > r We get AAHui = 0 Thus AHui ∈ N (A) and also AHui ∈ R AH as a Linear Combination of the columns of AH. Yet R AH ⊥N (A) Hence AHui = 0.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Proof. Continued. Since AHui = 0 we get ui HAvj = vj HAHui = 0. Thus UHAV = Σ, Where Σ is diagonal (Main Diagonal).

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Notice that AHA and AAH share the same non zero eigenvalues (Could be proved independently from the SVD). Let AAHui = σi 2ui for i = 1, 2, . . . , m. By Spectral Theorem: U = [u1, u2, . . . , um] , U ∈ Cmxm, UUH = UHU = Im Thus AHui = σi for i = 1, 2, . . . , m.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued Let AHAvi = ˆ σi 2vi for i = 1, 2, . . . , m. By Spectral Theorem: V = [v1, v2, . . . , vm] , V ∈ Cnxn, VV H = V HV = In Utilizing the above for the non zero ˆ σi 2: AAHui = σi 2ui ⇒ AHAAHui zi = σi 2AHui zi Meaning zi and σi 2 are eigenvectors and eigenvalues of AHA.

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued Examining zi yields: zj Hzi = uj HAAHui = σi 2uj Hui ⇒ zi = ui σi ⇒ vi = zi σi = AHui σi Consider the following n equations for i = 1, 2, . . . , m: Avi = AAH ui σi (or zero) = σi ui (or zero)

the SVD Theorem Theorem ∀A ∈ Cmxn can be factored as A = UΣV H. Where U ∈ Cmxm, V ∈ Cnxn are Unitary. Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n). Alternative Proof Continued These equations can be written as: AV = UΣ ⇔ A = UΣV H Where U and V as deﬁned above, Σ is an mxn matrix with the top left nxn block in diagonal form with σi on the diagonal and the bottom are zeros.

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications SVD Properties

It is often convenient to break the matrices in the SVD into two parts, corresponding to the nonzero singular values and the zero singular values. Let Σ = Σ1 Σ2 where Σ1 = diag (σ1, σ2, . . . , σr ) ∈ Rrxr and σ1 σ2 . . . σr , Σ2 = diag (σr+1, σr+2, . . . , σp) = diag (0, 0, . . . , 0) ∈ R(m−r)x(n−r) Then the SVD can be written as A = U1 U2 Σ1 Σ2 V1 H V2 H = U1 Σ1V1 H Where U1 ∈ Cmxr , U2 ∈ Cmx(m−r), V1 ∈ Cnxr and V2 ∈ Cnx(n−r).

The SVD can also be written as A = r ∑ i=1 σi ui vi H The SVD can also be used to compute 2 matrix norms: Hibert-Schmidt / Frobenius Norm A 2 F = ∑ i,j Aij 2 = r ∑ i=1 σi 2 l2 Norm A 2 = sup x=0 Ax x = max (λ (A)) = σ1 Which implies argmax x=0 Ax x = v1, argmax x=0 xHA x = u1

The Rank of a matrix is the number of nonzero singular values along the main diagonal of Σ. Using the notation used before rank (A) = r The SVD is numerically stable way of computing the rank of a matrix. The range (Column Space) of a matrix is R (A) = {b ∈ Cm : b = Ax} = b ∈ Cm : b = UΣV Hx = {b ∈ Cm : b = UΣy} = {b ∈ Cm : b = U1 ˜ y} = span (U1) The range of a matrix is spanned by the orthogonal set of vectors in U1, the ﬁrst r columns of U.

Generally, the other fundamental spaces of a matrix A can also be determined from the SVD: R (A) = span (U1) = R AAH N (A) = span (V2) R AH = span (V1) = R AHA N AH = span (U2) The SVD thus provides an explicit orthogonal basis and a computable dimensionality for each of the fundamental spaces of a matrix.

Since the SVD is a decomposition of a given matrix into 2 Unitary matrices and a diagonal matrix, all matrices could be described as a rotation, scaling and another rotation. This intuition is a result of the properties of unitary matrices which basically rotate the multiplied matrix. This property is farther examined when dealing Linear Equations.

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications SVD Example

Finding the SVD of a matrix (Numerically) using MATLAB command - [U S V] = svd(A). Let A = 1 2 3 6 5 4 Then A = UΣV H Where U = −0.355 −0.934 −0.934 0.355 Σ = 9.362 0 0 0 1.831 0 V =   −0.637 −0.653 0.408 −0.575 −0.050 −0.8165 −0.513 −0.754 0.408  

Let A be a Diagonal Matrix A = 2 0 0 −4 = 0 1 −1 0 4 0 0 2 0 1 1 0 In this case, the U and V matrices just shuﬄe the columns around and change the signs to make the singular values positive. Let A be a Square Symmetric Matrix A =   5 6 2 6 1 4 2 4 7   = UΣV H Where U = V =   0.592 −0.616 0.518 0.526 −0.191 0.828 0.610 0.763 −0.211   , Σ =   12.391 0 0 0 4.383 0 0 0 3.774   In this case, the SVD is the regular Eigen Decomposition.

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications Order Reduction

The SVD of a matrix can be used to determine how near (In the sense of l2-norm) the matrix is to a matrix of a lower rank. It can also be used to ﬁnd the nearest matrix of a given lower rank. Theorem Let A be an mxn matrix with rank(A) = r and let A = UΣV H. Let k < r and let Ak = k ∑ i=1 σi ui vi H = UΣkV H where Σk = diag (σ1, σ2, . . . , σk ) Then A − Ak 2 = σk+1, and Ak is the nearest matrix of rank k to A (In the sense of l2-norm / Frobenius norm): min rank(B)=k A − B 2 = A − Ak 2

Proof. Since A − Ak = Udiag (0, 0, . . . , 0, σk+1, . . . , σr , 0, . . . , 0) V H it follows that A − Ak 2 = σk+1. The second part of the proof is a "Proof by Inequality". By Deﬁnition of the matrix norm, for any unit vector z the following holds: A − B 2 2 ≥ (A − B) z 2 2 Let B be a rank - k matrix of size mxn. Then there exist vectors {x1, x2, . . . , xn−k } that span N (B) where xi ∈ Cn. Consider the vectors from the matrix V of the SVD, {v1, v2, . . . , vk+1} where vi ∈ Cn.

Proof. Continued. The intersection, span (x1, . . . , xn−k ) ∩ span (v1, . . . , vk+1) ⊆ Rn, cannot be zero since there are total of n + 1 vectors. Let z be a vector from this intersection, normalized s.t. z 2 = 1. Then: A − B 2 2 ≥ (A − B) z 2 2 = Az 2 2 Since z ∈ span (v1, v2, . . . , vk+1), Az = k+1 ∑ i=1 σi vi Hz ui Now A − B 2 2 ≥ Az 2 2 = k+1 ∑ i=1 σ2 i vi Hz 2 ≥ σ2 k+1 Lower bound is achieved by B = k ∑ i=1 σi ui vi H, with z = vk+1.

Applications of Order Reduction: Image Compression

Applications of Order Reduction: Noise Reduction Basic assumption - Noise is mainly pronounced in the small singular values. Noiseless Matrix Noisy Matrix - Std 1 Noisy Matrix - Std 6 Noisy Matrix - Std 11

Analyzing the eﬀect of noise on the Singular Values

Ground Truth Added Noise Std 6 Reconstruction using 140 Singular Values

Definitions and Notations Singular Value Decomposition Theorem Applications Solving Linear
Equation System 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Equation System Consider the solution of the equation Ax = b. If b ∈ R (A) there is at least one solution: If dim (N (A)) = 0 there is only one unique solution, xr ∈ R AH , s.t. Axr = b. If dim (N (A)) ≥ 1 , The columns of A are not independent, there are inﬁnite solutions. Any vector of the form ˆ x = xr + xn where xr ∈ R AH is the solution from the previous case and xn ∈ N (A) is a solution s.t. A (xr + xn) = b. Which solution should be chosen? Usually the solution with the minimum norm, xr . If b / ∈ R (A) there is no solution. Usually, the following vector is searched, ˆ x s.t. Aˆ x − b 2 is brought to minimum.

Equation System Assuming ˆ x = min x Ax − b 2 . By deﬁnition ˆ b = Aˆ x ∈ R (A). Meaning, the search is for ˆ b s.t. ˆ b − b 2 is minimized.

Equation System According to the Projection Theorem, only one vector ˆ b exists s.t. ˆ b − b 2 is minimized. This vector is the projection of b on R (A). ˆ b = ProjR(A) (b) = AHb Moreover, ˆ x = min x Ax − b 2 ⇔ AHA ˆ x = AHb Intuitively, the procedure is as following: Project b onto the Column Space R (A), namely, ˆ b = ProjR(A) (b) = AHb. Project ˆ x onto the Row Space R AH , namely, ProjR(AH ) (ˆ x) = Ax. Project the previous result Ax onto the Column Space R (A), namely, ProjR(A) (Aˆ x) = AH (Aˆ x).

Equation System The equation AHA ˆ x = AHb is called the Normal Equations. If the columns of A are independent then AHA is invertible and ˆ x could be calculated as the following: ˆ x = AHA −1 AHb This is the Least Squares solution using the Pseudo Inverse of A: A† = AHA −1 AH

Equation System Yet, if the columns of A are linearly dependent the Pseudo Inverse of A can’t be calculated directly. If A has dependent columns, then the null space of A is not trivial and there is no unique solution. The problem becomes selecting one solution out of the inﬁnite number of possible solutions. As mentioned, commonly accepted approach is to select the solution with the smallest norm (Length). This problem could be solved using the SVD and deﬁnition of the generalized Pseudo Inverse of a matrix.

Equation System Deﬁnition The Pseudo Inverse of a matrix A = UΣV H, denoted A† is given by A† = V Σ†UH Where Σ† is obtained by transposing Σ and inverting all non zero entries. Proposition III Let A = UΣV H and x† = A†b = V Σ†UHb. Then AHAx† = AHb. Namely, using the solution given by the Pseudo Inverse matrix calculated using the SVD holds the Normal Equations. This deﬁnition of Pseudo Inverse exists for any matrix.

Equation System Proposition III Let A = UΣV H and x† = A†b = V Σ†UHb. Then AHAx† = AHb. proof It’s suﬃcient to show that AH Ax† − b = 0. Ax† − b = UΣV H V Σ†UHb − b = UΣΣ†UH − I b = U ΣΣ† − I UH b

Equation System Proof. Continued. Thus, AH Ax† − b = V ΣHUHU ΣΣ† − I UHb = V ΣH ΣΣ† − I UHb One should observe that ΣH = ΣH rxr 0rx(m−r) 0(n−r)xr 0(m−r)x(m−r) Where Σr is r by r submatrix of non zero diagonal entries in Σ and ΣΣ† − I = 0rxr 0rx(m−r) 0(n−r)xr −I(m−r)x(m−r) Hence the multiplication yields the zero matrix.

Equation System Proposition IV The vector ˆ x = A†b is the shortest Least Squares solution to Ax = b, namely, ˆ x 2 = min { x 2 : Ax − b 2 is minimal} proof Using the fact both U and V are Unitary min x 2 min x Ax − b 2 = min x 2 min x UΣV Hx − b 2 = min V H x 2 min x ΣV Hx − UHb 2 = min y 2 min x Σy − UHb 2

Equation System Proof. Observing at min y 2 min x Σy − UHb 2 . Since Σ is diagonal (Main diagonal to the least) there’s only one Least Squares solution, ˆ y = Σ†UHb. Thus, ˆ x = V ˆ y = V Σ†UHb will attain the minimum norm.

Equation System As written previously, any solution which holds the Normal Equations is the Least Squares solution. ˆ x = min x Ax − b 2 ⇔ AHA ˆ x = AHb Yet, one should observe ˆ x ∈ R AH , namely, the solution lies in the Row Space of A. Hence, its norm is minimal among all solutions. In short, the Pseudo Inverse simultaneously minimizes the norm of the error as well as the norm of the solution.

Equation System Example I Examining the following Linear System: Ax = b Where, A =   8 10 3 30 9 6 6 18 1 1 10 3   , x =     x1 x2 x3 x4     =     1 2 3 6     , b =   217 147 51   Obviously, A−1 can’t be calculated. Moreover, since rank (A) = 3 neither AHA −1 exists. Yet the Pseudo Inverse using the SVD does exists.

Equation System Using the SVD approach A = UΣV H. Hence, A† = V Σ†UH. Using MATLAB to calculate the SVD yields: Σ =   39.378 0 0 0 0 10.002 0 0 0 0 3.203 0   → Σ† =     0.025 0 0 0 0.1 0 0 0 0.312 0 0 0     Calculating ˆ x yields: ˆ x = V Σ†UHb =     1 2 3 6     = x The SVD canceled the 4th column which is dependent on the 2nd column of A. Since b ∈ R (A) the exact solution could be calculated.

Equation System Example II In this case A =   5 0 0 0 0 2 0 0 0 0 0 0   , x =     x1 x2 x3 x4     =     1 2 3 6     , b =   5 4 3   Obviously, b / ∈ R (A). Neither A−1 nor AHA −1 exist. Using the SVD Pseudo Inverse: ˆ x = V Σ†UHb =     1 2 0 0    

Equation System Examining the solution using the SVD. First, Since rank (A) = 2 its Column Space is spanned by the ﬁrst 2 columns of U. Calculating the projection of b onto the Column Space of A is given by ˆ b = ProjR(A) (b) = 2 ∑ i=1 Ui H bUi =   5 4 0  . Now given the updated Linear System Aˆ x = ˆ b which has inﬁnite number of solutions. One could calculate that N (A) = span         0 0 1 0     ,     0 0 0 1         . Hence ˆ x =      ˆ b1 A1,1 ˆ b2 A2,2 0 0      +     s     0 0 1 0     + t     0 0 0 1         = ˆ xr + ˆ xn where s, t ∈ R.

Equation System The target is the solution with the minimum norm. Since ˆ xr ⊥ˆ xn the norm of this solution is ˆ x 2 = ˆ xr 2 + ˆ xn 2 The minimum norm solution is obtained by taking xn = 0. This results in the Pseudo Inverse solution as above.

Equation System Numerically Sensitive Problems Systems of equations that are poorly conditioned are sensitive to small change in values. Since, practically speaking, there are always inaccuracies in measured data, the solution to these equations may be almost meaningless. The SVD can help with the solution of ill-conditioned equations by identifying the direction of sensitivity and discarding that portion of the problem. The procedure will be illustrated by the following example.

Equation System Example III Examining the following system of equations Ax = b 1 + 3 1 − 3 3 − 3 + x1 x2 = b1 b2 The SVD of A is A = 1 √ 20 1 3 3 −1 2 √ 5 2 √ 5 1 1 1 −1 From which the exact inverse of A is A−1 = √ 20 1 1 1 −1 1 2 √ 5 1 2 √ 5 1 3 3 −1 = 1 20 1 + 3 3 − 1 1 − 3 3 + 1 Easily, one can convince himself that for small the matrix A−1 has large entries which makes x = A−1b unstable.

Equation System Observe that the entry 1 2 √ 5 multiplies the column 1 −1 . This is the sensitive direction. As b changes slightly, the solution changes in a direction mostly along the sensitive direction. If is small, σ2 = 2 √ 5 may be set to zero to approximate A. A ≈ 1 √ 20 1 3 3 −1 2 √ 5 0 1 1 1 −1 The Pseudo Inverse is A† = √ 20 1 1 1 −1 1 2 √ 5 0 1 3 3 −1 = 1 20 1 3 1 3 In this case the multiplier of the sensitive direction vector is zero, no motion in the sensitive direction occurs. Any Least Squares solution to the equation Ax = b is of the form ˆ x = A†b so that ˆ x = c 1 1 for c ∈ R, meaning perpendicular to the sensitive direction.

Equation System As this example illustrates, the SVD identifies the stable and unstable directions of the problem and, by zeroing small singular values, eliminates the unstable directions. The SVD could be used to both illustrate poor conditioning and provide a cure for the ailment. For the equation Ax = b with solution x = A−1b, writing the solution using the SVD: x = A−1b = UΣV H −1 = r ∑ i=1 vi uH i b σi If the singular value σi is small, then a small change in b or a small change in either U or V may be amplified into a large change in the solution x. A small singular value responds to a matrix which is nearly singular and thus more difficult to invert accurately.

Equation System Another point of view, considering the equation Ax0 = b0 ⇒ x0 = A−1b0 Let b = b0 + δb where δb is the error or noise, etc. Therefore Ax = b0 + δb ⇒ x = A−1b0 + A−1 δb = x0 + δx Investigating how small or large is this error in the answer for a given amount of error. Note that δx = A−1 δb ⇒ δx ≤ A−1 δb Or since A−1 = σmax A−1 = 1 σmin(A) the following holds δx ≤ δb σmin (A)

Equation System However recalling that x0 = A−1b0 and therefore x0 ≥ σmin A−1 b0 = b0 σmax (A) Combining the equations yields δx x0 ≤ δb b0 σmax (A) σmin (A) The last fraction, σmax (A) σmin(A) , is called ’The Condition Number of A’. This number is indicative of the magniﬁcation of error in linear equation of interest. In most problems, a matrix with very large condition number is called ill conditioned and will result in severe numerical diﬃculties.

Equation System The solution to those numerical diﬃculties using the SVD is basically rank reduction: 1 Compute the SVD of A. 2 Examine the singular values of A and zero out any that are "small" to obtain a new approximate Σ matrix. 3 Compute the solution by ˆ x = V Σ†UHb. Determining which singular values are "small" is problem dependent and requires some judgment.

Definitions and Notations Singular Value Decomposition Theorem Applications Total Least
Squares 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Squares In the classic Least Squares problems, the solution minimizing Ax − b 2 is sought after. The hidden assumption is that matrix A is correct, any error in the problem is in b. The Least Squares problem ﬁnds a vector ˆ x s.t. Aˆ x − b 2 = min which is accomplished by ﬁnding some perturbation r of the right hand side of minimum norm Ax = b + r s.t. (b + r) ∈ R (A). In the Total Least Squares problem, both the right and left side of the equation assumed to have errors. The solution of the perturbed equation (A + E) x = b + r is sought s.t. (b + r) ∈ R (A + E) and the norm of the perturbations is minimized.

Squares Intuitively, The right hand side is "bent" toward the left hand side while the left hand side is "bent" toward the right hand side.

Squares Let A be an mxn matrix. To ﬁnd the solution to the TLS problem one may observe the homogeneous form A + E|b + r x −1 = 0 → A|b + E|r x −1 = 0 Let C = A|b ∈ Cmx(n+1) and let ∆ = E|r be the perturbation of the data. In order for the homogeneous form to have solution the vector x −1 must lie in the Null Space of C + ∆ and in order for the solution not to be trivial, the perturbation ∆ must be such that C + ∆ is rank deﬁcient.

Squares Analyzing the TLS problem using the SVD. We bring (A + E) x = (b + r) into the form A + E|b + r x −1 = 0 Let A + E|b + r = UΣV H be the SVD of the above form. If σn+1 = 0 then rank A + E|b + r = n + 1 which means the R A + E|b + r = Rn+1, hence there’s no nonzero vector in the orthogonal complement of the Row Space hence the set of equations is incompatible. To obtain solution the rank of A + E|b + r must be reduced to n. As shown before the best approximation of rank n in both Frobenius and l2 norm is given by the SVD ˆ A|ˆ b = U ˆ ΣV H, ˆ Σ = diag (σ1, σ2, . . . , σn, 0)

Squares The minimal TLS correction is given by σn+1 = min rank ˆ A|ˆ b =n A|b − ˆ A|ˆ b F Attained for E|r = σn+1un+1vH n+1 Note that the TLS correction matrix has rank one. It is clear that the approximate set ˆ A|ˆ b x −1 = 0 is compatible and the solution is given by the only vector, vn+1, that belongs to N ˆ A|ˆ b . The TLS solution is obtained by scaling vn+1 until its last component equals to −1, or x −1 = −1 Vn+1,n+1 vn+1

Squares For simplicity it assumed that Vn+1,n+1 = 0 and σn > σn+1 hence the solution exists and it is unique. Otherwise, the solution might not exists or isn’t unique (Any superposition of few columns of V ). For complete analysis of the existence and uniqueness of the solution see []. Basic algorithm of the TLS would be: Given Ax ≈ b, where A ∈ Cmxn, b ∈ Cm the TLS solution could be obtained by Compute the SVD of A|b = UΣV H. If Vn+1,n+1 = 0 the TLS solution would be xTLS = −1 Vn+1,n+1 vn+1 (1 : n)

Squares The geometric properties of the solution could be described as following, the TLS solution minimizes the distance between the vector b to the plane defined by the solution xTLS. Let C = UΣV H From the definition of the l2 Norm of a matrix Cv 2 v 2 ≥ σn+1 Where v 2 = 0. Equality holds if and only if v ∈ Sc where Sc = span {vi } and vi are the columns of V which satisfy uH i Cvi = σn+1. The TLS problem amounts to finding vector x s.t. A|b x −1 2 x −1 2 = σn+1

Squares By squaring everywhere min x A|b x −1 2 2 x −1 2 2 = min x m ∑ i=1 AH i x − bi 2 xHx + 1 The quantity |AH i x−bi |2 xH x+1 is the square of the distance from the point AH i b ∈ Cn+1 to the nearest point on the hyperplane P deﬁned by P = a b |a ∈ Cn, b ∈ C, b = xHa So the TLS problem amounts to ﬁnding the closest hyperplane to the set of points AH 1 b1 , AH 2 b2 , . . . , AH m bm .

Squares The minimum distance property can be shown as following. Let P be the plane orthogonal to the normal vector n ∈ Rn+1 s.t. P = r ∈ Cn+1 : rHn = 0 and let n have the following form n = x −1 . Let p = AH m bm be a point in Cn+1. Finding a point, q ∈ Cn+1 which belongs to the plane P and is closest to the point p is a constrained optimization problem, minimize p − q subject to nHq = 0. The minimization function J (q) = p − q 2 + 2λnHq = pHp − 2pHq + 2λnHq + qHq = (q − p + λn)2 (q − p + λn) + 2λpHn − λ2nHn This is clearly minimized when q = p − λn.

Squares Determining λ by the constrain nHq = nHp − λnHn = 0 → λ = nHp nHn Inserting results into the "Minimization Function" yields J (q) = 2λpHn − λ2nHn = 2nHpppHn nHn − nHnpHpnHn nHnnHn = nHp 2 nHn = xHAm − bm 2 xHx + 1 Alternative solution using the "Projection Theorem". The distance from the point p to the plane P can be found by ﬁnding the length of the projection of p onto n, which yields d2 min (p, P) = p, n 2 n 2 = xH, −1 Am H bm xHx + 1

Definitions and Notations Singular Value Decomposition Theorem Applications Principal Component
Analysis 1 Definitions and Notations Notations Definitions Introduction 2 Singular Value Decomposition Theorem SVD Theorem Proof of the SVD Theorem SVD Properties SVD Example 3 Applications Order Reduction Solving Linear Equation System Total Least Squares Principal Component Analysis

Analysis Principal Component Analysis (PCA) is the mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called Principal Components. The number of Principal Components is less than or equal to the number of original variables. This transformation is deﬁned in such way that the ﬁrst component has a variance as high as possible (That’s, accounts for as much of the variability in data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (Uncorrelated with) the preceding components.

Analysis PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (Called the first Principal Component), the second greatest variance on the second coordinate and so on. Assuming given a collection of data of columns vectors a1, a2, . . . , am ∈ Rn. The projection of the data onto a subspace U ∈ Rr , r ≤ m which is is spanned by the orthogonal basis u1, u2, . . . , ur is given by ai = fi1u1 + fi2u2 + . . . + fir ur , i = 1 : m for some coefficients fij. Note that fij = aH i uj, the projection of ai along the direction of uj. By the Projection Theorem This projection is the closest in the l2 − norm sense to the data given by ai .

Analysis The search is after the orthogonal basis u1, u2, . . . , ur . Formulation the constraint of maximization of the variance along the direction of u1 yields max w =1 m ∑ i=1 aH i w 2 = AHw 2 = AHw H AHw = wHAAHw Using the SVD of A = UΣV H, Then AAH = UΣΣHUH. Observing, wHAAHw wHw = UHw H ΣΣH UHw (UHw)H (UHw) Noticing that there are only r non zero entries in Σ by the properties the SVD. Deﬁning x = UHw yields wHAAHw wHw = σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m

Analysis Now we have max w=0 wHAAHw wHw = max w=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m Assuming σ1 ≥ σ2 ≥ . . . ≥ σr . Then max w=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = σ2 1 = λ1 Which the largest eigenvalue of A. The vector x which makes the maximum is x1 = 1 and xi = 0 for i = 2 : m. Which corresponds to w = Ux = u1. The ﬁrst Principal Component is indeed achieved by the ﬁrst eigenvector u1 of AAH.

Analysis Calculating the second Principal Component under the constraint being orthogonal to the ﬁrst and maximizing the projection max w =1,wH u1=0 m ∑ i=1 aH i w 2 = max w=0,wH u1=0 wH AAH w wHw Using the deﬁnition from above yields max x=0,xH UH u1=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = max x=0,x1=0 σ2 1 x2 1 + σ2 2 x2 2 + . . . + σ2 r x2 r x2 1 + x2 2 + . . . + x2 m = σ2 2 = λ2 Which is the second largest eigenvalue of AAH. The vector x which makes the maximum is x2 = 1 and xi = 0 for i = 1, 3 : m. This corresponds to w = Ux = u2, The second Eigenvector, u2, of AAH.

Analysis Continuing this pattern, ui is the ith Principal Component. The set of orthogonal vectors which spans the subspace the data is projected to and maximizes the variance of the data is the ﬁrst r vectors which consists the orthogonal matrix from the SVD, U. Observing the SVD yields the the result immediately A = UΣV H → Y = UHA = ΣV H Observing the scatter matrix of Y CY = YY H = UHA UHA H = UHAAHU = UHCX U Since the matrix U is the Eigenvectors matrix of CX = XXH by the Diagonalization Theorem CY is diagonal. Another look yields YY H = ΣV H ΣV H H = ΣV HV ΣH = ΣΣH = diag(σ2 1 , σ2 2 , . . . , σ2 r ) Namely, the Scatter matrix, hence the Covariance Matrix of Y is diagonal. Moreover, The constraint on the variance holds.

Deﬁnitions and Notations Singular Value Decomposition Theorem Applications The SVD
is a decomposition which can be applied on any matrix. The SVD exposes fundamental properties of a linear operator such as the fundamental spaces, Frobenius Norm and l2 Norm. The SVD can be utilized in many applications such as solving linear systems (Least Squares, Total Least Squares) and order reduction (Compression, Noise Reduction, Principal Component Analysis). To Be Continued Regulating Linear Equations System.

Appendix For Further Reading A. Author. Handbook of Everything. Some
Press, 1990. S. Someone. On this and that. Journal on This and That. 2(1):50–100, 2000. R. Avital. On this and that. Journal on This and That. 2(1):50–100, 2000.

Singular Value Decomposition - Why and How?

Singular Value Decomposition - Why and How?

Other Decks in Science

Featured

Transcript