矩陣求導公式總結

阿新 • • 發佈：2019-01-18

今天推導公式，發現居然有對矩陣的求導，狂汗--完全不會。不過還好網上有人總結了。吼吼，趕緊搬過來收藏備份。

基本公式：
Y = A * X --> DY/DX = A'
Y = X * A --> DY/DX = A
Y = A' * X * B --> DY/DX = A * B'
Y = A' * X' * B --> DY/DX = B * A'

1. 矩陣Y對標量x求導：

相當於每個元素求導數後轉置一下，注意M×N矩陣求導後變成N×M了

Y = [y(ij)] --> dY/dx = [dy(ji)/dx]

2. 標量y對列向量X求導：

注意與上面不同，這次括號內是求偏導，不轉置，對N×1向量求導後還是N×1向量

y = f(x1,x2,..,xn) --> dy/dX =(Dy/Dx1,Dy/Dx2,..,Dy/Dxn)'

3. 行向量Y'對列向量X求導：

注意1×M向量對N×1向量求導後是N×M矩陣。

將Y的每一列對X求偏導，將各列構成一個矩陣。

重要結論：

dX'/dX = I

d(AX)'/dX = A'

4. 列向量Y對行向量X’求導：

轉化為行向量Y’對列向量X的導數，然後轉置。

注意M×1向量對1×N向量求導結果為M×N矩陣。

dY/dX' = (dY'/dX)'

5. 向量積對列向量X求導運演算法則：

注意與標量求導有點不同。

d(UV')/dX = (dU/dX)V' + U(dV'/dX)

d(U'V)/dX = (dU'/dX)V + (dV'/dX)U'

重要結論：

d(X'A)/dX = (dX'/dX)A + (dA/dX)X' = IA + 0X' = A

d(AX)/dX' = (d(X'A')/dX)' = (A')' = A

d(X'AX)/dX = (dX'/dX)AX + (d(AX)'/dX)X = AX + A'X

6. 矩陣Y對列向量X求導：

將Y對X的每一個分量求偏導，構成一個超向量。

注意該向量的每一個元素都是一個矩陣。

7. 矩陣積對列向量求導法則：

d(uV)/dX = (du/dX)V + u(dV/dX)

d(UV)/dX = (dU/dX)V + U(dV/dX)

重要結論：

d(X'A)/dX = (dX'/dX)A + X'(dA/dX) = IA + X'0 = A

8. 標量y對矩陣X的導數：

類似標量y對列向量X的導數，

把y對每個X的元素求偏導，不用轉置。

dy/dX = [ Dy/Dx(ij) ]

重要結論：

y = U'XV = ΣΣu(i)x(ij)v(j) 於是 dy/dX = [u(i)v(j)] = UV'

y = U'X'XU 則 dy/dX = 2XUU'

y = (XU-V)'(XU-V) 則 dy/dX = d(U'X'XU - 2V'XU + V'V)/dX = 2XUU' -2VU' + 0 = 2(XU-V)U'

9. 矩陣Y對矩陣X的導數：

將Y的每個元素對X求導，然後排在一起形成超級矩陣。

10.乘積的導數

d(f*g)/dx=(df'/dx)g+(dg/dx)f'

結論

d(x'Ax)=(d(x'')/dx)Ax+(d(Ax)/dx)(x'')=Ax+A'x （注意：''是表示兩次轉置）

比較詳細點的如下：

http://lzh21cen.blog.163.com/blog/static/145880136201051113615571/

http://hi.baidu.com/wangwen926/blog/item/eb189bf6b0fb702b720eec94.html

其他參考：

Notation
Derivatives of Linear Products
Derivatives of Quadratic Products

Notation

d/dx (y)isa vector whose (i) elementis dy(i)/dx
d/dx (y) is a vectorwhose (i) elementis dy/dx(i)
d/dx (y^T)is a matrixwhose (i,j) elementis dy(j)/dx(i)
d/dx (Y) is a matrixwhose (i,j) elementis dy(i,j)/dx
d/dX (y) is a matrixwhose (i,j) elementis dy/dx(i,j)

Note that the Hermitian transpose is not used because complexconjugates are not analytic.

In the expressions below matrices andvectors A, B, C donot depend on X.

Derivatives of Linear Products

d/dx (AYB) =A * d/dx (Y)* B
- d/dx (Ay) =A * d/dx (y)
d/dx(x^TA) =A
- d/dx(x^T) =I
- d/dx(x^Ta) =d/dx(a^Tx)= a
d/dX(a^TXb)= ab^T
- d/dX(a^TXa)= d/dX(a^TX^Ta)= aa^T
d/dX(a^TX^Tb)= ba^T
d/dx (YZ) =Y * d/dx (Z)+ d/dx (Y) *Z

Derivatives of Quadratic Products

d/dx (Ax+b)^TC(Dx+e)= A^TC(Dx+e) + D^TC^T(Ax+b)
- d/dx (x^TCx)= (C+C^T)x
- - [C:symmetric]: d/dx (x^TCx)= 2Cx
  - d/dx (x^Tx)= 2x
- d/dx (Ax+b)^T (Dx+e)= A^T (Dx+e) + D^T (Ax+b)
- - d/dx (Ax+b)^T (Ax+b)= 2A^T (Ax+b)
- [C:symmetric]: d/dx (Ax+b)^TC(Ax+b)= 2A^TC(Ax+b)
d/dX(a^TX^TXb)= X(ab^T +ba^T)
- d/dX(a^TX^TXa)= 2Xaa^T
d/dX(a^TX^TCXb)= C^TXab^T +CXba^T
- d/dX(a^TX^TCXa)= (C +C^T)Xaa^T
- [C:Symmetric] d/dX(a^TX^TCXa)= 2CXaa^T
d/dX((Xa+b)^TC(Xa+b))=(C+C^T)(Xa+b)a^T

Derivatives of Cubic Products

d/dx(x^TAxx^T)=(A+A^T)xx^T+x^TAxI

Derivatives of Inverses

d/dx (Y^-1)= -Y^-1d/dx (Y)Y^-1

Derivative of Trace

Note: matrix dimensions must result inan n*n argument fortr().

d/dX(tr(X))= I
d/dX(tr(X^k))=k(X^k^-1)^T
d/dX(tr(AX^k))=SUM_r=0:k-1(X^rAX^k-r^-1)^T
d/dX(tr(AX^-1B))= -(X^-1BAX^-1)^T
- d/dX(tr(AX^-1))=d/dX(tr(X^-1A))= -X^-TA^TX^-T
d/dX(tr(A^TXB^T))= d/dX(tr(BX^TA))= AB
- d/dX(tr(XA^T))= d/dX(tr(A^TX))=d/dX(tr(X^TA))= d/dX(tr(AX^T)) =A
d/dX(tr(AXBX^T))= A^TXB^T + AXB
- d/dX(tr(XAX^T))= X(A+A^T)
- d/dX(tr(X^TAX))= X^T(A+A^T)
- d/dX(tr(AX^TX))= (A+A^T)X
d/dX(tr(AXBX))= A^TX^TB^T + B^TX^TA^T
[C:symmetric] d/dX(tr((X^TCX)^-1A)= d/dX(tr(A(X^TCX)^-1)= -(CX(X^TCX)^-1)(A+A^T)(X^TCX)^-1
[B,C:symmetric] d/dX(tr((X^TCX)^-1(X^TBX))= d/dX(tr((X^TBX)(X^TCX)^-1)= -2(CX(X^TCX)^-1)X^TBX(X^TCX)^-1 +2BX(X^TCX)^-1

Derivative of Determinant

Note: matrix dimensions must result inan n*n argument fordet().

d/dX(det(X))= d/dX(det(X^T))= det(X)*X^-T
- d/dX(det(AXB)) =det(AXB)*X^-T
- d/dX(ln(det(AXB)))= X^-T
d/dX(det(X^k))= k*det(X^k)*X^-T
- d/dX(ln(det(X^k)))= kX^-T
[Real] d/dX(det(X^TCX))=det(X^TCX)*(C+C^T)X(X^TCX)^-1
- [C: Real,Symmetric] d/dX(det(X^TCX))= 2det(X^TCX)*CX(X^TCX)^-1
[C: Real,Symmetricc] d/dX(ln(det(X^TCX)))= 2CX(X^TCX)^-1

If y is a functionof x,then dy^T/dx isthe Jacobian matrixof y with respectto x.

Its determinant,|dy^T/dx|, isthe Jacobian of y withrespect to x andrepresents the ratio of thehyper-volumes dy and dx.The Jacobian occurs when changing variables in an integration:Integral(f(y)dy)=Integral(f(y(x))|dy^T/dx|dx).

Hessian matrix

If f is a functionof x then the symmetricmatrixd²f/dx² = d/dx^T(df/dx)is the Hessian matrix off(x). A valueof x for whichdf/dx = 0 correspondsto a minimum, maximum or saddle point according to whether theHessian is positive definite, negative definite or indefinite.

d²/dx² (a^Tx)= 0
d²/dx² (Ax+b)^TC(Dx+e)= A^TCD + D^TC^TA
- d²/dx² (x^TCx)= C+C^T
- - d²/dx² (x^Tx)= 2I
- d²/dx² (Ax+b)^T (Dx+e)= A^TD + D^TA
- - d²/dx² (Ax+b)^T (Ax+b)= 2A^TA
- [C:symmetric]: d²/dx² (Ax+b)^TC(Ax+b)= 2A^TCA

http://www.psi.toronto.edu/matrix/calculus.html

矩陣求導公式總結

Contents

Notation

Derivatives of Linear Products

Derivatives of Quadratic Products

Derivatives of Cubic Products

Derivatives of Inverses

Derivative of Trace

Derivative of Determinant

Hessian matrix

矩陣求導公式總結

矩陣求導、幾種重要的矩陣及常用的矩陣求導公式

機器學習中常用的矩陣向量求導公式

常用矩陣向量求導公式

神經網路的反向傳播演算法中矩陣的求導方法(矩陣求導總結)

機器學習常見的矩陣求導總結

常用矩陣對向量求導公式

矩陣求導

矩陣求導法則

線性迴歸矩陣求導

矩陣求導（下）——矩陣對矩陣的求導

矩陣求導（上）——標量對矩陣的求導

【Maths】導數和求導公式

矩陣求導例項

機器學習儲備（4）：最常用的求導公式

矩陣求導與轉置運算

矩陣求導與投影梯度相關問題

神經網路中矩陣求導術的應用

線性迴歸矩陣求導

機器學習---迴歸預測---向量、矩陣求導

矩陣求導公式總結

Contents

Notation

Derivatives of Linear Products

Derivatives of Quadratic Products

Derivatives of Cubic Products

Derivatives of Inverses

Derivative of Trace

Derivative of Determinant

Hessian matrix

相關推薦