1. 程式人生 > >Bayesian non-negative matrix factorization核心過程推導

Bayesian non-negative matrix factorization核心過程推導

       最近閱讀了一篇老文章—Bayesian non-negative matrix factorization,是個論文集,本文在540-547頁,這篇文章用貝葉斯方法重做了一遍非負矩陣分解,但其推導過程過於簡略,本人將記錄一下其核心推導過程,也就是原文公式(5)和公式(7)的推導過程。

       定義${\bf{X}} = {\bf{AB}} + {\bf{E}}$,其中${\bf{X}} \in {R^{I \times J}}$${\bf{A}} \in {R^{I \times N}}$,${\bf{B}} \in {R^{N \times J}}$,關於${\bf{X}}的似然函式為:

$p\left( {{\bf{X}}\left| {{\bf{A}},{\bf{B}},{\sigma ^2}} \right.} \right) = \prod\limits_{i,j} {{\cal N}\left( {{{\bf{X}}_{i,j}}\left| {{{\left( {{\bf{AB}}} \right)}_{i,j}},{\sigma ^2}} \right.} \right)} = {\prod\limits_{i,j} {\left( {2\pi {\sigma ^2}} \right)} ^{ - 1/2}}\exp \left\{ { - {{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}/\left( {2{\sigma ^2}} \right)} \right\}$

而變數${\bf{A}}$${\bf{B}}$的先驗為:

$p\left( {\bf{A}} \right) = \prod\limits_{i,n} {\varepsilon \left( {{{\bf{A}}_{i,n}};{\alpha _{i,n}}} \right)} = \prod\limits_{i,n} {{\alpha _{i,n}}{\rm{exp}}\left( { - {\alpha _{i,n}}{{\bf{A}}_{i,n}}} \right)} u\left( {{{\bf{A}}_{i,n}}} \right)$

$p\left( {\bf{B}} \right) = \prod\limits_{n,j} {\varepsilon \left( {{{\bf{B}}_{n,j}};{\beta _{n,j}}} \right)} = \prod\limits_{n,j} {{\beta _{n,j}}{\rm{exp}}\left( { - {\beta _{n,j}}{{\bf{B}}_{n,j}}} \right)} u\left( {{{\bf{B}}_{n,j}}} \right)$

另外,再定義噪聲方差${{\sigma ^2}}$的先驗:

p\left( {{\sigma ^2}} \right) = {{\cal G}^{ - 1}}\left( {{\sigma ^2}{\rm{;}}k{\rm{,}}\theta } \right) = \frac{{{\theta ^k}}}{{\Gamma \left( k \right)}}{\left( {{\sigma ^2}} \right)^{ - k - 1}}{\rm{exp}}\left( { - \frac{\theta }{{{\sigma ^2}}}} \right)

關於${\bf{A}}$${\bf{B}}$的條件後驗密度是一個高斯分佈乘以一個截斷的指數分佈,也就是一個截斷的高斯分佈,我們定義這種形式為${\cal R}\left( {x{\rm{;}}\mu {\rm{,}}{\sigma ^2}{\rm{,}}\lambda } \right) \propto {\cal N}\left( {x{\rm{;}}\mu {\rm{,}}{\sigma ^2}} \right)\varepsilon \left( {x{\rm{;}}\lambda } \right)$,因此,關於${{\bf{A}}_{i,n}}$的條件概率密度為:

\begin{array}{l} p\left( {{{\bf{A}}_{i,n}}\left| {{\bf{X}},{{\bf{A}}_{{\rm{\backslash (}}i,n)}},{\bf{B}}} \right.,{\sigma ^2}} \right) = {\cal R}\left( {{{\bf{A}}_{i,n}}{\rm{;}}{\mu _{{{\bf{A}}_{i,n}}}}{\rm{,}}\sigma _{{{\bf{A}}_{i,n}}}^2{\rm{,}}{\alpha _{i,n}}} \right) = {\cal N}\left( {{{\bf{A}}_{i,n}}{\rm{;}}{\mu _{{{\bf{A}}_{i,n}}}}{\rm{,}}\sigma _{{{\bf{A}}_{i,n}}}^2} \right)\varepsilon \left( {{{\bf{A}}_{i,n}}{\rm{;}}{\alpha _{i,n}}} \right)\\ = \varepsilon \left( {{{\bf{A}}_{i,n}}{\rm{;}}{\alpha _{i,n}}} \right)\prod\limits_j {{{\left( {2\pi {\sigma ^2}} \right)}^{ - 1/2}}} \exp \left\{ { - {{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}/2{\sigma ^2}} \right\} \end{array}(1)

為了方便表示,我們先考慮上式的指數部分:

\begin{array}{l} - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}} \\ = - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {\left\{ {{\bf{X}}_{i,j}^2 - 2{{\bf{X}}_{i,j}}{{\left( {{\bf{AB}}} \right)}_{i,j}} + {{\left[ {{{\left( {{\bf{AB}}} \right)}_{i,j}}} \right]}^2}} \right\}} \\ = - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {\left\{ {{\bf{X}}_{i,j}^2 - 2{{\bf{X}}_{i,j}}{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}} - 2{{\bf{X}}_{i,j}}\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} + {{\left( {\sum\limits_n {{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}} } \right)}^2}} \right\}} \end{array}(2)

其中:

{\left( {\sum\limits_n {{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}} } \right)^2} = {\left( {{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}} \right)^2} + 2{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} + {\left( {\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)^2}(3)

將(3)回代入(2):

\begin{array}{l} - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}} \\ = - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {\left\{ {{\bf{X}}_{i,j}^2 - 2{{\bf{X}}_{i,j}}{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}} - 2{{\bf{X}}_{i,j}}\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} + {{\left( {{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}} \right)}^2} + 2{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} + {{\left( {\sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)}^2}} \right\}} \\ = - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {\left\{ {{{\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)}^2} - 2{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right) + {{\left( {{{\bf{A}}_{i,n}}{{\bf{B}}_{n,j}}} \right)}^2}} \right\}} \end{array}

\begin{array}{l} = - \frac{1}{{2{\sigma ^2}}}\left[ {{{\left( {{{\bf{A}}_{i,n}}} \right)}^2}\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2} - 2{{\bf{A}}_{i,n}}\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right) + \sum\limits_j {{{\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)}^2}} } } } \right]\\ = - \frac{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{2{\sigma ^2}}}\left[ {{{\left( {{{\bf{A}}_{i,n}}} \right)}^2} - \frac{{2{{\bf{A}}_{i,n}}\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }} + \cdots } \right] \end{array}

推導進行到這一步後,會發現根本湊不出來完全平方項,但通過觀察第二項,發現除了{{\bf{A}}_{i,n}}\frac{{2\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{\bf{A}}_{i,n}}無關,而且上式中,除了前兩項,第三項\sum\limits_j {{{\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)}^2}}也與{{\bf{A}}_{i,n}}無關,因此,我們可以根據前兩項配出完全平方項,而多餘的部分由於在指數上最後就會變成一個比例常數項,因此,上式可以改寫為:

\begin{array}{l} - \frac{1}{{2{\sigma ^2}}}\sum\limits_j {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}} \\ = - \frac{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{2{\sigma ^2}}}\left[ {{{\left( {{{\bf{A}}_{i,n}}} \right)}^2} - \frac{{2{{\bf{A}}_{i,n}}\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }} + {{\left\{ {\frac{{\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}} \right\}}^2}} \right] \end{array}

= - \frac{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{2{\sigma ^2}}}{\left\{ {{{\bf{A}}_{i,n}} - \frac{{\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}} \right\}^2} + C(4)

將公式(4)回代如公式(1):

將公式(4)回代如公式(1):p\left( {{{\bf{A}}_{i,n}}\left| {{\bf{X}},{{\bf{A}}_{{\rm{\backslash (}}i,n)}},{\bf{B}}} \right.,{\sigma ^2}} \right) = {\cal R}\left( {{{\bf{A}}_{i,n}}{\rm{;}}{\mu _{{{\bf{A}}_{i,n}}}}{\rm{,}}\sigma _{{{\bf{A}}_{i,n}}}^2{\rm{,}}{\alpha _{i,n}}} \right) = {\cal N}\left( {{{\bf{A}}_{i,n}}{\rm{;}}{\mu _{{{\bf{A}}_{i,n}}}}{\rm{,}}\sigma _{{{\bf{A}}_{i,n}}}^2} \right)\varepsilon \left( {{{\bf{A}}_{i,n}}{\rm{;}}{\alpha _{i,n}}} \right)

\propto \varepsilon \left( {{{\bf{A}}_{i,n}}{\rm{;}}{\alpha _{i,n}}} \right)\prod\limits_j {{{\left( {2\pi {\sigma ^2}/\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} } \right)}^{ - 1/2}}} \exp \left\{ { - \frac{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{2{\sigma ^2}}}{{\left[ {{{\bf{A}}_{i,n}} - \frac{{\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}} \right]}^2}} \right\}

因此,{\cal N}\left( {{{\bf{A}}_{i,n}}{\rm{;}}{\mu _{{{\bf{A}}_{i,n}}}}{\rm{,}}\sigma _{{{\bf{A}}_{i,n}}}^2} \right)中:

{\mu _{{{\bf{A}}_{i,n}}}} = \frac{{\sum\limits_j {{{\bf{B}}_{n,j}}\left( {{{\bf{X}}_{i,j}} - \sum\limits_{n' \ne n} {{{\bf{A}}_{i,n'}}{{\bf{B}}_{n',j}}} } \right)} }}{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}

\sigma _{{{\bf{A}}_{i,n}}}^2 = \frac{{\sum\limits_j {{{\left( {{{\bf{B}}_{n,j}}} \right)}^2}} }}{{2{\sigma ^2}}}

關於噪聲方差:

\begin{array}{l} p\left( {{\sigma ^2}\left| {{\bf{A}},{\bf{B}},{\bf{X}}} \right.} \right) \propto p\left( {{\sigma ^2}} \right)p\left( {{\bf{X}}\left| {{\bf{A}},{\bf{B}},{\sigma ^2}} \right.} \right) = {{\cal G}^{ - 1}}\left( {{\sigma ^2};{k_{{\sigma ^2}}},{\theta _{{\sigma ^2}}}} \right)\\ = \frac{{{\theta ^k}}}{{\Gamma \left( k \right)}}{\left( {{\sigma ^2}} \right)^{ - k - 1}}{\rm{exp}}\left( { - \frac{\theta }{{{\sigma ^2}}}} \right){\prod\limits_{i,j} {\left( {2\pi {\sigma ^2}} \right)} ^{ - 1/2}}\exp \left\{ { - {{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}/\left( {2{\sigma ^2}} \right)} \right\} \end{array}

\begin{array}{l} = \frac{{{\theta ^k}}}{{\Gamma \left( k \right)}}{\left( {{\sigma ^2}} \right)^{ - k - 1}}{\rm{exp}}\left( { - \frac{\theta }{{{\sigma ^2}}}} \right){\left( {2\pi {\sigma ^2}} \right)^{ - IJ/2}}\exp \left\{ { - \sum\limits_{i,j} {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}/\left( {2{\sigma ^2}} \right)} } \right\}\\ \propto \frac{{{{\left[ {\theta + \frac{1}{2}\sum\limits_{i,j} {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}} } \right]}^k}}}{{\Gamma \left( {k + IJ/2} \right)}}{\left( {{\sigma ^2}} \right)^{ - k - 1 - IJ/2}}\exp \left( { - \frac{{\theta + \frac{1}{2}\sum\limits_{i,j} {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}} }}{{{\sigma ^2}}}} \right) \end{array}

因此,噪聲方差所服從的逆伽馬分佈的引數更新公式分別為:

{k_{{\sigma ^2}}} = k + IJ/2(感覺此處原文有誤,原文多加了個1),

{\theta _{{\sigma ^2}}} = \theta + \frac{1}{2}\sum\limits_{i,j} {{{\left( {{{\bf{X}}_{i,j}} - {{\left( {{\bf{AB}}} \right)}_{i,j}}} \right)}^2}}