1. 程式人生 > 實用技巧 >7.1 Diagonalization of symmetric matrices (對稱矩陣的對角化)

7.1 Diagonalization of symmetric matrices (對稱矩陣的對角化)

本文為《Linear algebra and its applications》的讀書筆記

目錄

Diagonalization of symmetric matrices

A symmetric matrix is a matrix A A A such that A T = A A^T = A AT=A. Such a matrix is necessarily square.

To begin the study of symmetric matrices, it is helpful to review the diagonalization process of Section 5.3.

在這裡插入圖片描述
PROOF
Let v 1 \boldsymbol v_1 v1 and v 2 \boldsymbol v_2 v2 be eigenvectors that correspond to distinct eigenvalues, say, λ 1 \lambda_1 λ1 and λ 2 \lambda_2 λ2.

在這裡插入圖片描述
Since λ 1 ≠ λ 2 \lambda_1\neq\lambda_2

λ1=λ2, v 1 ⋅ v 2 = 0 \boldsymbol v_1\cdot\boldsymbol v_2=0 v1v2=0.

An n × n n\times n n×n matrix A A A is said to be orthogonally diagonalizable(正交對角化) if there are an orthogonal matrix P P P (with P − 1 = P T P^{-1} = P^T P1=PT ) and a diagonal matrix D D D such that

在這裡插入圖片描述
Such a diagonalization requires n n

n linearly independent and orthonormal eigenvectors.

If A A A is orthogonally diagonalizable, then

在這裡插入圖片描述
Thus A A A is symmetric! Theorem 2 below shows that, conversely, every symmetric matrix is orthogonally diagonalizable. The proof is much harder and is omitted; the main idea for a proof will be given after Theorem 3.

在這裡插入圖片描述
This theorem is rather amazing, because the work in Chapter 5 would suggest that it is usually impossible to tell when a matrix is diagonalizable. But this is not the case for symmetric matrices.

EXAMPLE 3
Orthogonally diagonalize the matrix

在這裡插入圖片描述
, whose characteristic equation is

在這裡插入圖片描述
SOLUTION

在這裡插入圖片描述

Although v 1 \boldsymbol v_1 v1 and v 2 \boldsymbol v_2 v2 are linearly independent, they are not orthogonal. Then use the projection of v 2 \boldsymbol v_2 v2 onto v 1 \boldsymbol v_1 v1 to produce an orthogonal set.

在這裡插入圖片描述
Then { v 1 , z 2 } \{\boldsymbol v_1,\boldsymbol z_2\} {v1,z2} is an orthogonal set in the eigenspace for λ = 7 \lambda= 7 λ=7. (Note that z 2 \boldsymbol z_2 z2 is a linear combination of the eigenvectors v 1 \boldsymbol v_1 v1 and v 2 \boldsymbol v_2 v2, so z 2 \boldsymbol z_2 z2 is in the eigenspace.)

Normalize v 1 \boldsymbol v_1 v1 and z 2 \boldsymbol z_2 z2 to obtain the following orthonormal basis for the eigenspace for λ = 7 \lambda= 7 λ=7:

在這裡插入圖片描述
An orthonormal basis for the eigenspace for λ = − 2 \lambda =-2 λ=2 is

在這裡插入圖片描述
By Theorem 1, u 3 \boldsymbol u_3 u3 is orthogonal to the other eigenvectors u 1 \boldsymbol u_1 u1 and u 2 \boldsymbol u_2 u2. Hence { u 1 , u 2 , u 3 } \{\boldsymbol u_1,\boldsymbol u_2,\boldsymbol u_3\} {u1,u2,u3} is an orthonormal set. Let

在這裡插入圖片描述
Then P P P orthogonally diagonalizes A A A, and A = P D P − 1 A = PDP^{-1} A=PDP1.

The Spectral Theorem 譜定理

The set of eigenvalues of a matrix A A A is sometimes called the s p e c t r u m spectrum spectrum of A A A, and the following description of the eigenvalues is called a s p e c t r a l spectral spectral t h e o r e m theorem theorem.

在這裡插入圖片描述

  • Part ( a ) (a) (a) follows from Supplementary exercises in Section 5.5.
  • Part ( b ) (b) (b) follows easily from part (d).
  • Part ( c ) (c) (c) is Theorem 1.
  • Because of ( a ) (a) (a), a proof of ( d ) (d) (d) can be found in the A p p e n d i x Appendix Appendix: proof of Theorem 3.

Spectral Decomposition 譜分解

Suppose A = P D P − 1 A= PDP^{-1} A=PDP1, where the columns of P P P are orthonormal eigenvectors u 1 , . . . , u n \boldsymbol u_1,..., \boldsymbol u_n u1,...,un of A A A and the corresponding eigenvalues λ 1 , . . . , λ n \lambda_1,...,\lambda_n λ1,...,λn are in the diagonal matrix D D D. Then, since P − 1 = P T P^{-1}= P^T P1=PT ,

在這裡插入圖片描述
在這裡插入圖片描述
This representation of A A A is called a spectral decomposition of A A A because it breaks up A A A into pieces determined by the spectrum (eigenvalues) of A A A.

  • Each term in (2) is an n × n n\times n n×n matrix of rank 1. For example, every column of λ 1 u 1 u 1 T \lambda_1\boldsymbol u_1\boldsymbol u_1^T λ1u1u1T is a multiple of u 1 \boldsymbol u_1 u1.
  • Furthermore, each matrix u j u j T \boldsymbol u_j\boldsymbol u_j^T ujujT is a projection matrix(投影矩陣) in the sense that for each x \boldsymbol x x in R n \R^n Rn, the vector ( u j u j T ) x (\boldsymbol u_j\boldsymbol u_j^T)\boldsymbol x (ujujT)x is the orthogonal projection of x \boldsymbol x x onto the subspace spanned by u j \boldsymbol u_j uj .
    PROOF
    ( u j u j T ) x = u j ( u j T x ) = ( u j T x ) u j = ( u j ⋅ x ) u j (\boldsymbol u_j\boldsymbol u_j^T)\boldsymbol x=\boldsymbol u_j(\boldsymbol u_j^T\boldsymbol x)=(\boldsymbol u_j^T\boldsymbol x)\boldsymbol u_j=(\boldsymbol u_j\cdot\boldsymbol x)\boldsymbol u_j (ujujT)x=uj(ujTx)=(ujTx)uj=(ujx)uj. ( u j T x \boldsymbol u_j^T\boldsymbol x ujTx is a scaler) This is the orthogonal projection of x \boldsymbol x x onto u \boldsymbol u u.
    在這裡插入圖片描述

Appendix: proof of Theorem 3 (d)

The Schur factorization(舒爾分解) of an n × n n \times n n×n matrix A A A is in the form A = U R U T A= URU^T A=URUT , where U U U is an orthogonal matrix and R R R is an n × n n \times n n×n upper triangular matrix.

THEOREM
Let A A A be an n × n n\times n n×n matrix with n n n real eigenvalues, counting multiplicities, denoted by λ 1 , . . . , λ n \lambda_1,...,\lambda_n λ1,...,λn. It can be shown that A A A admits a (real) Schur factorization.
PROOF
Parts (a) and (b) show the key ideas in the proof. The rest of the proof amounts to repeating (a) and (b) for successively smaller matrices, and then piecing together the results.
a. Let u 1 \boldsymbol u_1 u1 be a unit eigenvector corresponding to λ 1 \lambda_1 λ1, let u 2 , . . . , u n \boldsymbol u_2,...,\boldsymbol u_n u2,...,un be any other vectors such that { u 1 , . . . , u n } \{\boldsymbol u_1,...,\boldsymbol u_n\} {u1,...,un} is an orthonormal basis for R n \R^n Rn, and then let U = [ u 1 u 2 . . . u n ] U =\begin{bmatrix}\boldsymbol u_1&\boldsymbol u_2&...&\boldsymbol u_n\end{bmatrix} U=[u1u2...un]. It can be shown that the first column of U T A U U^T AU UTAU is λ 1 e 1 \lambda_1\boldsymbol e_1 λ1e1, where e 1 \boldsymbol e_1 e1 is the first column of the n × n n\times n n×n identity matrix.
b. Part (a) implies that U T A U U^TAU UTAU has the form shown below.

在這裡插入圖片描述

Since d e t ( U T A U − λ I ) = d e t ( U T A U − λ U T U ) = d e t ( U T ( A − λ I ) U ) = d e t ( U T ) d e t ( A − λ I ) d e t ( U ) = d e t ( U − 1 ) d e t ( A − λ I ) d e t ( U ) = d e t ( A − λ I ) det(U^TAU-\lambda I)=det(U^TAU-\lambda U^TU)=det(U^T(A-\lambda I)U)=det(U^T)det(A-\lambda I)det(U)=det(U^{-1})det(A-\lambda I)det(U)=det(A-\lambda I) det(UTAUλI)=det(UTAUλUTU)=det(UT(AλI)U)=det(UT)det(AλI)det(U)=det(U1)det(AλI)det(U)=det(AλI), the characteristic polynomials of U T A U U^TAU UTAU and A A A are the same. Thus U T A U U^TAU UTAU and A A A have the same eigenvalues, which indicates that the eigenvalues of A 1 A_1 A1 are λ 2 , . . . , λ n \lambda_2,...,\lambda_n λ2,...,λn.

Similar to (a), let u 2 ′ \boldsymbol u_2' u2 be a unit eigenvector corresponding to λ 2 \lambda_2 λ2, let u 3 ′ , . . . , u n ′ \boldsymbol u_3',...,\boldsymbol u_n' u3,...,un be any other vectors such that { u 2 ′ , . . . , u n ′ } \{\boldsymbol u_2',...,\boldsymbol u_n'\} {u2,...,un} is an orthonormal basis for R n − 1 \R^{n-1} Rn1, and then let U ′ = [ u 2 ′ u 3 ′ . . . u n ′ ] U' =\begin{bmatrix}\boldsymbol u_2'&\boldsymbol u_3'&...&\boldsymbol u_n'\end{bmatrix} U=[u2u3...un]. It can be shown that the first column of U ′ T A 1 U ′ U'^T A_1U' UTA1U is λ 2 e 1 ′ \lambda_2\boldsymbol e_1' λ2e1, where e 1 ′ \boldsymbol e_1' e1 is the first column of the ( n − 1 ) × ( n − 1 ) (n-1)\times (n-1) (n1)×(n1) identity matrix. So U ′ T A 1 U ′ U'^TA_1U' UTA1U has the form similar to U T A U U^TAU UTAU.

Suppose U T A U = [ λ 1 x T 0 A 1 ] U^TAU=\begin{bmatrix}\lambda_1&\boldsymbol x^T\\\boldsymbol 0&A_1\end{bmatrix} UTAU=[λ10xTA1], then
[ 1 0 0 U 1 T ] U T A U [ 1 0 0 U 1 ] = [ 1 0 0 U 1 T ] [ λ 1 x T 0 A 1 ] [ 1 0 0 U 1 ] = [ λ 1 x T U 1 0 U 1 T A 1 U 1 ] \begin{aligned}\begin{bmatrix}1&\boldsymbol 0\\\boldsymbol 0&U_1^T\end{bmatrix}U^TAU\begin{bmatrix}1&\boldsymbol 0\\\boldsymbol 0&U_1\end{bmatrix}&=\begin{bmatrix}1&\boldsymbol 0\\\boldsymbol 0&U_1^T\end{bmatrix}\begin{bmatrix}\lambda_1&\boldsymbol x^T\\\boldsymbol 0&A_1\end{bmatrix}\begin{bmatrix}1&\boldsymbol 0\\\boldsymbol 0&U_1\end{bmatrix}\\&=\begin{bmatrix}\lambda_1&\boldsymbol x^TU_1\\\boldsymbol 0&U_1^TA_1U_1\end{bmatrix}\end{aligned} [100U1T]UTAU[100U1]=[100U1T][λ10xTA1][100U1]=[λ10xTU1U1TA1U1]

Let U ′ = U [ 1 0 0 U 1 ] U'=U\begin{bmatrix}1&\boldsymbol 0\\\boldsymbol 0&U_1\end{bmatrix} U=U[100U1], it can be shown that U ′ U' U is an orthogonal matrix. So
U ′ T A U ′ = [ λ 1 ∗ ∗ ∗ ∗ 0 λ 2 ∗ ∗ ∗ . . . 0 . . . . . . A 2 0 0 ] U'^TAU'=\begin{bmatrix}\lambda_1&*&*&*&*\\0&\lambda_2&*&*&*\\...&0\\...&...&&A_2\\0&0\end{bmatrix} UTAU=λ10......0λ20...0A2

Continue this process and we will finally get a (real) Schur factorization of A A A.


With the theorem above, the proof is quite easy.

Let A A A be a symmetric matrix. Since A A A has n n n real eigenvalues, counting multiplicities, A A A has a real Schur factorization U R U T URU^T URUT. Since A T = U R T U T = A = U R U T A^T=UR^TU^T=A=URU^T AT=URTUT=A=URUT, R = R T R=R^T R=RT, which indicates that R R R is in fact a diagonal matrix, with eigenvalues on its main diagonal.

Thus A = U R U − 1 A=URU^{-1} A=URU1, where U U U is an orthogonal matrix and R R R is a diagonal matrix. So A A A is orthogonally diaganalizable.