1. 程式人生 > 其它 >矩陣求導(工具書)

矩陣求導(工具書)

本文主要記錄了常見的矩陣、向量求導的公式,並給出了相關證明

一、基本概念與性質

記號規範請參考:記號規範

1. 跡

對稱矩陣\(A\)的跡定義為:

\[Tr(A) = \sum_{i=1}^nA_i^i \tag{1.1} \]

2. 跡的運算

(1)

\[Tr(A) = \sum_{i=1}^n\lambda_{i} \tag{1.2.1} \]

其中\(\lambda_i\)為矩陣\(A\)的第\(i\)個特徵值

(2)

\[Tr(A) = Tr(A^T) \tag{1.2.2} \]

(3)

\[Tr(AB) = \sum_{i=1}^n\left(\sum_{j=1}^nA_i^jB_j^i\right) = \sum_{j=1}^n\left(\sum_{i=1}^nB_j^iA_i^j\right) = Tr(BA) \tag{1.2.3} \]

(4)

\[Tr(A + B) = Tr(A) + Tr(B) \tag{1.2.4} \]

(5)

\[Tr(\mathbf{x}\mathbf{x}^T) = \sum_{i=1}^n\mathbf{x}_i\cdot \mathbf{x}_i = \mathbf{x}^T\mathbf{x} \tag{1.2.5} \]

3. 行列式

對稱矩陣\(A\)的行列式定義為:

\[\det (A) = \sum_{\sigma \in S_n}(-1)^{\mathrm{sgn}(\sigma)}\prod_{i=1}^n A_i^{\sigma(i)} \tag{1.3.1} \]

其中\(S_n\)是集合\(\{1, 2, \cdots, n\}\)

上置換的全體,即集合\(\{1, 2, \cdots, n\}\)到自身的一一對映(雙射)的全體;

例如:\(\{2, 3, 1\}\)\(\{1, 3, 2\}\)的置換,且滿足\(\sigma(1) = 2, \sigma(2) = 3, \sigma(3) = 1\)

其中\({\rm sgn} (\sigma)\)表示的是置換\(\sigma\)中逆序對(即\(\sigma(i) > \sigma(j),1 \leq i \leq j \leq n\))的數量;

例如:\({\rm sgn}(\{2, 3, 1\}) = 2\)

對於有\(n\)個元素的集合而言,其置換的個數有\(n!\)

4. 行列式的計算

(1)

\[\det (A) = \prod_{i=1}^n \lambda_i \tag{1.4.1} \]

其中\(S_n\)是集合\(\{1, 2, \cdots, n\}\)上置換的全體,即集合\(\{1, 2, \cdots, n\}\)到自身的一一對映(雙射)的全體;

(2)

\[\det(A) \overset{按行展開}{=} \sum_{j=1}^n(-1)^{i + j}A_i^{j}\det\left([A]_i^{j}\right) \overset{按列展開}{=} \sum_{i=1}^n(-1)^{i + j}A_i^{j}\det\left([A]_i^{j}\right) \tag{1.3.2} \]

(3)

\[\det(kA) = k^n\det(A) \tag{1.3.3} \]

(4)

\[\det(A^T) = \det(A) \tag{1.3.4} \]

(5)

\[\det(AB) = \det(A)\det(B) \tag{1.3.5} \]

(6)

\[\det(A^{-1}) = \frac{1}{\det(A)} \tag{1.3.6} \]

(7)

\[\begin{align} \det(I + \mathbf{u} \mathbf{v}^T) &= 1 + \mathbf{u}^T\mathbf{v} \tag{1.3.7} \end{align} \]

(8)

\[\mathrm{adj}(A) = \det(A)\cdot A^{-1} \tag{1.3.8} \]

二、向量與矩陣的運算結論

1. 矩陣相乘

(1)

\[\begin{align} A\cdot B &= \left((AB)_i^j\right)_{m\times n} \\ &= \left(\sum_k A_i^kB_k^j\right)_{m\times n} \end{align} \tag{2.1.1} \]

(2)

\[\begin{align} (A\cdot B)\cdot C &= \left(\sum_k(AB)_i^kC_k^j\right)_{m\times n}\\ &= \left(\sum_k\left(\sum_tA_i^tB_t^k\right)C_k^j \right)_{m\times n} \end{align} \tag{2.1.2} \]

(3)

\[A\cdot [E_i^j] = \left(0, \cdots \underbrace{A^i}_{第j列},\cdots ,0 \right) \tag{2.1.3} = [A^i]^j \]

(4)

\[[E_i^j]\cdot A = \left(\begin{array}{cc} &0\\ &\vdots\\ 第i行\left\{\right. &A_j\\ &\vdots \\ &0 \end{array} \right) = [A_j]_i \tag{2.1.4} \]

三、向量、矩陣求導

1. 求導佈局

  • 分子佈局:求導結果的第一維度以分子為主

  • 分母佈局:求導結果的第一維度以分母為主

例如:\(m\)維列向量\(\mathbf{y}\)對於\(\mathbf{x}\)求導,若

  • 分子佈局(雅可比矩陣):
\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} \\ \vdots&\ddots &\vdots \\ \frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \\ \]
  • 分母佈局(梯度矩陣):
\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} \\ \vdots &\ddots &\vdots \\ \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \\ \]

:以下所有求導結果均以分子佈局為基礎(若分子為標量,則為分母佈局)

2. 求偏微分法則

\[\partial C = 0 (C為常(矩陣、向量、標量)) \tag{3.2.1} \]\[\partial A^T = (\partial A)^T \tag{3.2.2} \]\[\partial (A + B) = \partial A + \partial B \tag{3.2.3} \]\[\partial (AB) = \partial A\cdot B + A\cdot \partial B \tag{3.2.4} \]\[\partial (A\odot B) = \partial A\odot B + A\odot \partial B \tag{3.2.5} \]\[\partial( A\otimes B) = \partial A\otimes B +A\otimes \partial B \tag{3.2.6} \]\[\partial ({A^{-1}}) = -A^{-1}\cdot \partial A\cdot A^{-1} \tag{3.2.7} \]\[\partial\ Tr(A) = Tr(\partial A) \tag{3.2.8} \]\[\partial \mathrm{det}A = Tr(\mathrm{adj}A \cdot \partial A) = \mathrm{detA}\cdot Tr(A^{-1} \partial A) \tag{3.2.9} \]

鏈式求導法則:

\[\partial g\circ f(A) = \sum_k\sum_t \frac{\partial g\circ f(A)}{\partial f(A)_k^t}\cdot \partial f(A)_k^t = Tr\left(\left(\frac{\partial g\circ f(A)}{\partial f(A)}\right)^T\cdot \partial f(A)\right) \tag{3.2.10} \]

3. 向量求導

(1)

\[\frac{\partial \mathbf{x}}{\partial x} = \left( \begin{array}{cc} \frac{\mathrm{d}\mathbf{x}_1}{\mathrm{d}x} \\ \vdots\\ \frac{\mathrm{d}\mathbf{x}_m}{\mathrm{d}x} \end{array} \right) \tag{3.3.1} \]

(2)

\[\frac{\partial \mathbf{x}^T}{\partial x} = \left(\frac{\partial \mathbf{x}}{\partial x}\right)^T \tag{3.3.2} \]

(3)

\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \frac{\partial \mathbf{y}}{\partial \mathbf{x^T}} = \left( \begin{matrix} \frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_1}{\partial\mathbf{x}_n} \\ \vdots &\ddots &\vdots \\ \frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_1} &\cdots &\frac{\partial\mathbf{y}_m}{\partial\mathbf{x}_n} \end{matrix} \right) \tag{3.3.3}\]

(4)

\[\frac{\partial \mathbf{y}^T}{\partial \mathbf{x}} =\frac{\partial \mathbf{y}^T}{\partial \mathbf{x}^T} = \left( \frac{\partial \mathbf{y}}{\partial \mathbf{x}} \right)^T \tag{3.3.4} \]

(5)

\[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial\mathbf{x}} = \left(\begin{array}{cc} \mathbf{y}_1 \\ \vdots \\ \mathbf{y}_n \end{array} \right) = \mathbf{y} \tag{3.3.5} \]

(6)

\[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}^T} = \left( \frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}} \right)^T \tag{3.3.6} \]

(7)

\[\frac{\partial A\mathbf{x}}{\partial\mathbf{x}} = \frac{\partial A\mathbf{x}}{\partial\mathbf{x}^T} = \left( \begin{array}{cc} A_{1}^1 &\cdots &A_{1}^m \\ \vdots & \ddots &\vdots \\ A_{n}^1 &\cdots &A_{n}^m \\ \end{array} \right) = A \tag{3.3.7}\]

(8)

\[\frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}^T} = (A + A^T)\mathbf{x} \tag{3.3.8} \]

4. 矩陣求導

(1)

\[\frac{\partial \mathbf{x}^TA\mathbf{y}}{\partial A} = \mathbf{x}\mathbf{y}^T \tag{3.4.1} \]

(2)

\[\frac{\partial \mathbf{x}^TA^T\mathbf{y}}{\partial A} = \mathbf{y}\mathbf{x}^T \tag{3.4.2} \]

(3)

\[\frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A} = A(\mathbf{y}\mathbf{x}^T + \mathbf{x}\mathbf{y}^T ) \tag{3.4.3} \]
展開證明 $$\begin{align*} \frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A} &= \left(\frac{\partial \mathbf{x}^TA^TA\mathbf{y}}{\partial A_i^j} \right) _{m\times n} \\ &= \left(\frac{\partial \sum_p(\sum_q A_p^q \mathbf{x}_q)(\sum_qA_p^q\mathbf{y}_q)}{\partial A_i^j}\right)_{m\times n}\\ &= \left(\frac{\partial \left(A_i^j\mathbf{x}_j(\sum_qA_i^q\mathbf{y}_q) + A_i^j\mathbf{y}_j(\sum_qA_i^q\mathbf{x}_q) \right)}{\partial A_i^j}\right)_{m\times n} \\ &= \left((\sum_qA_i^q\mathbf{x}_j\mathbf{y}_q) + (\sum_qA_i^q\mathbf{y}_j\mathbf{x}_q) \right)_{m\times n} \\ &= \left(\sum_q A_i^q (\mathbf{y}\mathbf{x}^T)^j_q + \sum_q A_i^q (\mathbf{x}\mathbf{y}^T)^j_q \right)_{m\times n} \\ &= A(\mathbf{y}\mathbf{x}^T + \mathbf{x}\mathbf{y}^T ) \end{align*} $$

(4)

\[\frac{\partial A^TBA}{\partial B_{i}^{j}} = A_i^TA_j \tag{3.4.4} \]
展開證明 $$\begin{align*} \frac{\partial A^TBA}{\partial B_{i}^{j}} &= \left(\frac{\partial (A^TBA)_p^q }{\partial B_i^j}\right)_{n\times n} \\ &= \left( \frac{\partial \sum_k(\sum_t A_t^pB_t^k)A_k^q}{\partial B_i^j} \right)_{n\times n} \\ &= \left( \begin{matrix} A_i^1A_j^1, &A_i^1A_j^2, &\cdots, &A_i^1A_j^n\\ A_i^2 A_j^1, &A_i^2A_j^2, &\cdots, &A_i^2A_j^n\\ \vdots, &\vdots, &\ddots, &\vdots \\ A_i^n A_j^1, &A_i^nA_j^2, &\cdots, &A_i^nA_j^n \end{matrix} \right) \\ &= A_i^TA_j \end{align*}$$

(5)

\[\frac{\partial A^TBA}{\partial A_{i}^j} = [E_j^i]\cdot (BA) + (A^TB)\cdot [E_i^j] \tag{3.4.5} \]
展開證明 $$\begin{align*} \frac{\partial A^TBA}{\partial A_{i}^j} &= \left( \frac{\partial(A^TBA)_p^q}{\partial A_i^j} \right)_{n\times n} \\ &=\left( \frac{\partial(\sum_k(\sum_tA_t^pB_t^k)A_k^q)}{\partial A_i^j} \right)_{n\times n} \\ &= \left(\frac{\partial(\delta(p, j)\cdot \sum_{k}A_i^j B_i^kA_k^q +\delta(q, j)\cdot \sum_t A_t^pB_t^iA_i^j)}{\partial A_i^j}\right)_{n\times n}\\ &= \left(\delta(p, j)\cdot\sum_kB_i^kA_k^q + \delta(q, j)\cdot\sum_tA_t^p B_t^i\right)_{n\times n}\\ &= \left(\delta(p, j)\cdot(BA)_i^q + \delta(q, j)\cdot(A^TB)_p^i\right)_{n\times n}\\ &= \left(\begin{array}{cc} \delta(1, j)\\ \delta(2, j)\\ \vdots \\ \delta(n, j) \end{array}\right)\cdot (BA)_i + (A^TB)^i\cdot (\delta(1, j), \delta(2, j), \cdots, \delta(n, j)) \\ &= I^j\cdot I_i\cdot (BA) + (A^TB)\cdot I^i \cdot I_j \\ &= [E_j^i]\cdot (BA) + (A^TB)\cdot [E_i^j] \end{align*}$$

可簡記為:\(\frac{\partial A^TBA}{\partial A_i^j} = \frac{\partial A^T}{\partial A_i^j}\cdot BA + A^TB\cdot \frac{\partial A}{\partial A_i^j}\)

(6)

\[\frac{\partial \mathbf{y}^TA^TBA\mathbf{z}}{\partial A} = B^TA\mathbf{y}\mathbf{z}^T + BA\mathbf{z}\mathbf{y}^T \tag{3.4.6} \]
展開證明 \begin{align*} \frac{\partial \mathbf{y}^TA^TBA\mathbf{z}}{\partial A} &\overset{}{=} \mathbf{y}^T\left(\frac{\partial A^TBA}{\partial A_i^j} \right)_{m\times n} \mathbf{z}\\ &\overset{\rm 由(3.4.5)}{=} \left(\mathbf{y}^T\left( A^TB[E_i^j] + [E_j^i]BA\right)\mathbf{z} \right)_{m\times n} \\ &= \left( \mathbf{y}^T[(A^TB)^i]^j\mathbf{z} \right)_{m\times n} + \left( \mathbf{y}^T[(BA)_i]_j\mathbf{z} \right)_{m\times n} \\ &= \left([\mathbf{y}^T\cdot (A^TB)^i]^j\mathbf{z} \right)_{m\times n} + \left(\mathbf{y}^T[(BA)_i\cdot \mathbf{z}]_j \right)_{m\times n}\\ &= \left(\sum_k\mathbf{y}_k(A^TB)_k^i\cdot \mathbf{z}_j\right)_{m\times n} + \left(\mathbf{y}_j\cdot \sum_k(BA)_i^k\mathbf{z}_k\right)_{m\times n} \\ &= \left(\mathbf{y}^T\cdot (A^TB)\right)^T\cdot\mathbf{z}^T + \left( (BA)\cdot \mathbf{z}\cdot \mathbf{y}^T\right) \\ &= B^TA\mathbf{y}\mathbf{z}^T + BA\mathbf{z}\mathbf{y}^T \end{align*}

(7)

\[\frac{\partial }{\partial A}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) = (D + D^T)(A\mathbf{x} + \mathbf{y})\mathbf{x}^T \tag{3.4.7} \]
展開證明 $$\begin{align*} \frac{\partial }{\partial A}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) &= \left(\frac{\partial }{\partial A_i^j}(A\mathbf{x} + \mathbf{y})^TD(A\mathbf{x} + \mathbf{y}) \right)_{m\times n} \\ & \overset{\rm 由鏈式法則}{=} \left(Tr\left(\left[\left.\frac{\partial\mathbf{z^T}D\mathbf{z}}{\partial \mathbf{z}}\right|_{\mathbf{z} = A\mathbf{x} + \mathbf{y}}\right]^T\frac{\partial (A\mathbf{x} + \mathbf{y})}{\partial A_i^j}\right)\right)_{m\times n}\\ &= \left(Tr\left(\mathbf{z}^T(D + D^T)\frac{\partial(A\mathbf{x} + \mathbf{y})}{\partial A_i^j}\right)\right)_{m\times n} \\ &= \left(Tr\left(\left.\frac{\partial (\mathbf{w}^TA\mathbf{x} + \mathbf{w}^T\mathbf{y})}{\partial A_i^j}\right|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left(\mathbf{w}^T[E_i^j]\mathbf{x}|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left([\mathbf{w}_i]_j\cdot \mathbf{x}|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \left(Tr\left(\mathbf{w}_i\cdot\mathbf{x}_j|_{\mathbf{w} = (D+D^T)(A\mathbf{x} + \mathbf{y})}\right)\right)_{m\times n} \\ &= \mathbf{w}\cdot \mathbf{x}^T \\ &= (D + D^T)(A\mathbf{x} + \mathbf{y})\mathbf{x}^T \end{align*}$$

5. 行列式求導

(1)

\[\frac{\partial \det(Y)}{\partial x} = \det(Y)\cdot Tr(Y^{-1}\frac{\partial Y}{\partial x}) \]
展開證明 $$\begin{align*} \frac{\partial \det(Y)}{\partial x} &= Tr\left(\left(\frac{\partial \det(Y)}{\partial Y}\right)^T\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\left(\frac{\partial \sum_{k=1}^n (-1)^{i+k}Y_{i}^k\det([Y]_i^k)}{\partial Y_i^j}\right)^T_{m\times n}\cdot \frac{\partial Y}{\partial x} \right) \\ &= Tr\left(\left((-1)^{i+j}\det\left([Y]_i^j\right)\right)^T_{m\times n}\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\left(\mathrm{cof}(Y)\right)^T\cdot \frac{\partial Y}{\partial x}\right) \\ &= Tr\left(\mathrm{adj}(Y)\cdot \frac{\partial Y}{\partial x}\right) \\ &= \det(Y)\cdot Tr(Y^{-1}\frac{\partial Y}{\partial x}) \end{align*}$$

(2)

\[\frac{\partial \det(A)}{\partial A} = \det(A)\cdot \left(A^{-1}\right)^T \tag{3.5.2} \]
展開證明 \begin{align*} \frac{\partial \det(A)}{\partial A} &= \left(\frac{\partial \det(A)}{\partial A_i^j} \right)_{n\times n} \\ &= \left(\det(A)\cdot Tr\left(A^{-1}\cdot \frac{\partial A}{\partial A_i^j}\right)\right)_{n\times n} \\ &= \left(\det (A)\cdot Tr\left(A^{-1}\cdot [A_i^j]\right)\right)_{n\times n} \\ &= \left(\det(A)\cdot Tr\left([(A^{-1})^i]^j\right)\right)_{n\times n}\\ &= \left(\det (A)\cdot \left(A^{-1}\right)_j^i\right)_{n\times n}\\ &= \det(A)\cdot \left(A^{-1}\right)^T \end{align*}

(3)

\[\frac{\partial \det(X^TAX)}{\partial X} = \det(X^TAX)\cdot\left(AX(X^TAX)^{-1} + A^TX(X^TA^TX)^{-1} \right) \tag{3.5.3} \]
展開證明 \begin{align*} \frac{\partial \det(X^TAX)}{\partial X} &= \left(Tr\left(\left(\frac{\partial\det(X^TAX)}{\partial X^TAX}\right)^T\cdot \frac{\partial X^TAX}{\partial X_i^j}\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(Tr\left((X^TAX)^{-1}\cdot \frac{\partial X^TAX}{\partial X_i^j}\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(Tr\left((X^TAX)^{-1}\cdot [E_j^i]\cdot AX + (X^TAX)^{-1}\cdot X^TA\cdot [E_i^j]\right) \right)_{m\times n}\\ &= \det(X^TAX)\cdot \left(Tr\left([((X^TAX)^{-1})^j]^i\cdot AX\right) + Tr\left((X^TAX)^{-1}\cdot [(X^TA)^i]^j\right)\right)_{m\times n} \\ &= \det(X^TAX)\cdot \left(\sum_k((X^TAX)^{-1})^j_k\cdot (AX)_i^k\right)_{m\times n} + \det(X^TAX)\left(\sum_{k}((X^TAX)^{-1})_j^k\cdot (X^TA)_k^i\right)_{m\times n} \\ &= \det(X^TAX)\left(\left((AX)\cdot(X^TAX)^{-1}\right)_{i}^j\right)_{m\times n} + \det(X^TAX)\left(\left((A^TX)\cdot(X^TAX)^{-T}\right)_{i}^j\right)_{m\times n} \\ &= \det(X^TAX)\cdot\left(AX(X^TAX)^{-1} + A^TX(X^TA^TX)^{-1} \right) \end{align*}

(4)

\[\frac{\partial \ln \det(X^TX)}{\partial X}= 2(X^{L+})^T \tag{3.5.4} \]
展開證明 $$\begin{align*} \frac{\partial \ln \det(X^TX)}{\partial X} &= \frac{1}{\det(X^TX)}\cdot \left(Tr\left(\frac{\partial \det(X^TX)}{\partial X_{i}^j}\right)\right)_{m\times n} \\ &= \frac{\det(X^TX)}{\det(X^TX)}\cdot \left(Tr\left(2\sum_k X_i^k\left((X^TX)^{-1}\right)_k^j \right)\right)_{m\times n} \\ &= 2\left(\sum_k X_i^k\left((X^TX)^{-1}\right)_k^j \right)_{m\times n} \\ &= 2X(X^TX)^{-1} \\ &= 2(X^{L+})^T \end{align*}$$

6. 矩陣逆的求導

(1)

\[\frac{\partial Y^{-1}}{\partial x} = -Y^{-1}\frac{\partial Y}{\partial x}Y^{-1} \tag{3.6.1} \]
展開證明 $$\begin{align*} \frac{\partial Y^{-1}}{\partial x} &= Y^{-1}\frac{\partial (Y\cdot Y^{-1}) - \partial(Y)\cdot Y^{-1} }{\partial x} \\ &= -Y^{-1}\frac{\partial Y}{\partial x}Y^{-1} \end{align*}$$

(2)

\[\frac{\partial \mathbf{a}^TX^{-1}\mathbf{b}}{\partial X} = X^{-T}\mathbf{a}\mathbf{b}^TX^{-T} \tag{3.6.2} \]
展開證明 $$\begin{align*} \frac{\partial \mathbf{a}^TX^{-1}\mathbf{b}}{\partial X} &= \left(\frac{\partial \mathbf{a}^{T}X^{-1}\mathbf{b}}{\partial X_i^j}\right)_{m\times n} \\ &= \left(\mathbf{a}^T X^{-1}[X_i^j]X^{-1}\mathbf{b}\right)_{m\times n}\\ &= \left(\mathbf{a}^TX^{-1} I^i\cdot I_jX^{-1}\mathbf{b}\right)_{m\times n}\\ &= \left(\mathbf{a}^T(X^{-1})^{i}\cdot (X^{-1})_j\mathbf{b}\right)_{m\times n}\\ &= \left((X^{-1})_j\mathbf{b}\cdot \mathbf{a}^T(X^{-1})^i\right)_{m\times n} \\ &= \left(X^{-1}\mathbf{b}\mathbf{a}^TX^{-1}\right)^{T}\\ &= X^{-T}\mathbf{a}\mathbf{b}^TX^{-T} \end{align*}$$

(3)

\[\frac{\partial \det(X^{-1})}{\partial X} = \det(X^{-1})(X^{-1})^T \tag{3.6.3} \]
展開證明 $$\begin{align*} \frac{\partial \det(X^{-1})}{\partial X} &= \left(Tr\left(\frac{\partial \det(X^{-1})}{\partial X^{-1}}\right)^T\cdot \frac{\partial X^{-1}}{\partial X_i^j}\right)_{n\times n} \\ &= \det(X^{-1})\left(X\cdot X^{-1}\frac{\partial X}{\partial X_i^j}X^{-1}\right)_{n\times n}\\ &= \det(X^{-1})\left([E_i^i]X^{-1}\right)_{n\times n}\\ &= \det(X^{-1})\left(\left(X^{-1}\right)^i_j\right)_{m\times n}\\ &= \det(X^{-1})(X^{-1})^T \end{align*}$$

(4)

\[\frac{\partial Tr(AX^{-1}B)}{\partial X} = \left(X^{-1}BAX^{-1}\right)^{T} \tag{3.6.4} \]
展開證明 $$\begin{align*} \frac{\partial Tr(AX^{-1}B)}{\partial X} \overset{\left(AX^{-1}B\right)_i^i = A_iX^{-1}B^{i}}{==========} &\sum_i \frac{\partial A_i X^{-1}B^{i}}{\partial X} \\ =&\sum_i X^{-T}(A^T)^i (B^T)_{i}X^{-T} \\ \overset{\sum_iA^iB_i = AB}{========}& X^{-T}A^TB^TX^{-T} \\ =& \left(X^{-1}BAX^{-1}\right)^{T} \end{align*}$$

(5)

\[\begin{align} \frac{\partial Tr\left((X+A)^{-1}\right) }{\partial X} &\overset{由3.6.4}{=}((X+A)^{-1}(X+A)^{-1})^T \end{align} \tag{3.6.5} \]

7. 跡的求導

(1)

\[\frac{\partial Tr(X)}{\partial X} = I \tag{3.7.1} \]
展開證明 $$\begin{align*} \frac{\partial Tr(X)}{\partial X} &= \left(\frac{\partial \sum_k X_k^k}{\partial X_i^j}\right)_{n\times n} \\ &= \left(\delta_i^j\right)_{n\times n} \\ &= I \end{align*}$$

(2)

\[\frac{\partial Tr(XA)}{\partial X} = A^T \tag{3.7.2} \]
展開證明 $$\begin{align*} \frac{\partial Tr(XA)}{\partial X} &= \left(\frac{\sum_k\sum_t X_k^tA_t^k}{\partial X_i^j}\right)_{m\times n} \\ &= \left(A_j^i\right)_{m\times n} \\ &= A^T \end{align*}$$

(3)

\[\frac{\partial Tr(AXB)}{\partial X} = A^TB^T \tag{3.7.3} \]
展開證明 $$\begin{align*} \frac{\partial Tr(AXB)}{\partial X} &= \left(\frac{\partial \sum_k A_kXB^k}{\partial X_i^j}\right)_{m\times n} \\ &= \left(A_k^iB_j^k\right)_{m\times n} \\ &= A^TB^T \end{align*}$$

(4)

\[\frac{\partial Tr(A \otimes X)}{\partial X} = Tr(A)I \tag{3.7.4} \]
展開證明 $$\begin{align*} \frac{\partial Tr(A \otimes X)}{\partial X} &= \left(\frac{\partial \sum_k A_k^k X_k^k}{\partial X_i^j}\right)_{n\times n} \\ &= \left(\sum_k A_k^k\delta_i^j\right)_{n\times n}\\ &= Tr(A)I \end{align*}$$