論文筆記 Memory Fusion Network for Multi-view Sequential Learning (AAAI2018)

這是卡內基梅隆大學與新加坡南洋理工大學在AAAI上發表的一篇利用memory network來處理序列建模的文章。

文章中的multi view其實指代可以很廣泛，許多地方也叫做multi modal，對於多模態序列學習而言，模態往往存在兩種形式的互動（1）模態內關聯（view-specific interactions)，(2)模態間關聯（cross-view interactions)，這篇文章提出了Memory Fusion Network(MFN)方法來處理這種多模態序列建模，處於對模態內與模態間的不同處理，本文可將方法劃分為三個部分（1）LSTM對各自模態單獨建模（2）Delta-memory Attention Network(DMAN)（3）Multi-view Gated Memory，後兩者致力於處理模態間的互動。
這裡寫圖片描述

Input:

比如對語言，視訊，音訊序列進行建模， $N = {l, v, a}$

} $N = \left\{l,v,a\right\}$ ，the input data of the

n

$n$ th view is denoted as:

x_{n} = {x_{n}^{t} : t <= T, x_{n}^{t} \in R^{d_{x_{n}}}}

$x_n = \left\{x_n^t:t<=T,x_n^t \in R^{d_{x_n}}\right\}$ ，where

d_{x_{n}}

$d_{x_n}$ is the input dimensionality of

n

$n$ th view input

x_{n}

$x_n$ .

System of LSTMs:

使用常規的LSTM, 對於每個輸入 $x_n$ ，每一個step的memory表示為 $c_n= \left\{c_n^t:t<=T,c_n^t \in R^{d_{c_n}}\right\}$ ，每個step的output表示為 $h_n= \left\{h_n^t:t<=T,h_n^t \in R^{d_{c_n}}\right\}$ ，where $d_{c_n}$ denotes the dimensionality of $n$ th LSTM memory $c_n$ .

Delta-memory Attention Network

a^{[t - 1, t]} = D_{α} (c^{[t - 1, t]})

$a^{[t-1,t]} = D_\alpha(c^{[t-1,t]})$
Delta顧名思義，考慮了LSTM前後兩個step，輸入到DMAN的是

t - 1

$t - 1$ 與

t

$t$ 的memory拼接，其中

D_{α} : R^{2 d_{c}} \mapsto R^{2 d_{c}} ， d_{c} = \sum_{n} d_{c_{n}}

$D_\alpha: R^{2d_c}\mapsto R^{2d_c}，d_c= \sum_{n}d_{c_n}$ , 通過上式來獲得attention係數，

a^{[t - 1, t]}

$a^{[t-1,t]}$ 是對於時刻

t - 1

$t-1$ 與

t

$t$ 的softmax score。
DMAN的輸出定義如下

{\hat{c}}^{[t - 1, t]} = c^{[t - 1, t] ⨀ a^{[t - 1, t]}}

$\hat c^{[t-1,t]}= c^{[t-1,t] \bigodot {a^{[t-1,t]}}}$

{\hat{c}}^{[t - 1, t]}

$\hat c^{[t-1,t]}$ 是分配權重之後的memories，

⨀

$\bigodot$ 是element product.

Multi-view Gated Memory

（1）首先以上面的 $\hat c^{[t-1,t]}$ 為輸入，生成update proposal $\hat u^t$ 。

\hat{u}

論文筆記 Memory Fusion Network for Multi-view Sequential Learning (AAAI2018)

論文筆記 Memory Fusion Network for Multi-view Sequential Learning (AAAI2018)

論文筆記-Deep Interest Network for Click-Through Rate Prediction

論文筆記-Deep Affinity Network for Multiple Object Tracking

論文筆記（CPN）：Cascaded Pyramid Network for Multi-Person Pose Estimation

論文筆記-Temporal segment network:towards good practices for deep action recognition

『論文閱讀』A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems

《Cascaded Pyramid Network for Multi-Person Pose Estimation》--曠世2017COCO keypoints冠軍論文解讀

論文筆記 Stacked Hourglass Networks for Human Pose Estimation

論文筆記（2）--（Re-ID） Learning Discriminative Features with Multiple Granularities for Person Re-Id

論文筆記：Spectral Normalization for Generative Adversarial Networks [ICLR2018 oral]

【論文筆記】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

『演算法學習』CPN：Cascaded Pyramid Network for Multi-Person Pose Estimation

論文筆記：Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks

論文筆記《Selective Search for object recognition》

ECCV2016論文 Peak-piloted deep network for facial expression Recognition 解析

論文筆記：Perceptual Losses for Real-Time Style Transfer and Super-Resolution[doing]

【論文閱讀】Feedback Network for Image Super-Resolution

論文筆記5：How to Discount Deep Reinforcement Learning:Towards New Dynamic Strategies

論文筆記 Co-Attending Free-Form Regions and Detections （AAAI2018)

2017-06-Deep Network Flow for Multi-Object Tracking-論文閱讀筆記

論文筆記 Memory Fusion Network for Multi-view Sequential Learning (AAAI2018)

相關推薦