1. 程式人生 > 其它 >Barquero-2022-BeLFusion Latent Diffusion for Behavior-Driven Human Motion Prediction

Barquero-2022-BeLFusion Latent Diffusion for Behavior-Driven Human Motion Prediction

# BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction #paper


1. paper-info

1.1 Metadata

  • Author:: [[German Barquero]], [[Sergio Escalera]], [[Cristina Palmero]]
  • 作者機構:: Universitat de Barcelona
  • Keywords:: #HMP #Diffusion
  • Journal:: 預印本
  • Date:: [[2022-11-25]]
  • 狀態:: #Done
  • 連結:: http://arxiv.org/abs/2211.14304
  • 修改時間:: 2022.12.07

1.2. Abstract

Stochastic human motion prediction (HMP) has generally been tackled with generative adversarial networks and variational autoencoders. Most prior works aim at predicting highly diverse movements in terms of the skeleton joints' dispersion. This has led to methods predicting fast and motion-divergent movements, which are often unrealistic and incoherent with past motion. Such methods also neglect contexts that need to anticipate diverse low-range behaviors, or actions, with subtle joint displacements. To address these issues, we present BeLFusion

, a model that, for the first time, leverages latent diffusion models in HMP to sample from a latent space where behavior is disentangled from pose and motion. As a result, diversity is encouraged from a behavioral perspective. Thanks to our behavior coupler's ability to transfer sampled behavior to ongoing motion, BeLFusion's predictions display a variety of behaviors that are significantly more realistic than the state of the art. To support it, we introduce two metrics, the Area of the Cumulative Motion Distribution, and the Average Pairwise Distance Error, which are correlated to our definition of realism according to a qualitative study with 126 participants. Finally, we prove BeLFusion's generalization power in a new cross-dataset scenario for stochastic HMP.


2. Introduction

  • 領域:Stochastic human motion prediction
  • 針對什麼問題:
    • 傳統方法生成的動作序列會造成姿勢不真實。見Fig.1
  • 作者的方法:
    • 為解決預測序列與歷史序列的速度和方向連貫性,將這部分資訊和動作行為資訊解耦出來,通過diffusion model編碼到潛空間中。由於解耦之後,通過潛變數生成的動作會更加真實。
  • Contributions
    • BeLFusion model
    • diversity motion prediction
    • cross-dataset evaluation
    • new metrics

Fig.1 Tradition approaches and BeLFusion
Source:
*** ### 3. Methodology

Fig. 2. BeLFusion architecture
Source:

3.1. Problem definition

給定一個歷史序列\(X=\{p_{t-B},...,p_{t-2},p_{t-1}\}\)去預測未來序列\(Y^i=\{p_t^i,p_{t+1}^i,..,p_{t+T+1}^i\}\)

3.2. Motion latent diffusion

利用潛在擴散模型(Latent diffusion models--LDM)擴散取樣出從動作序列解耦出的潛在變數\(z=\varepsilon(Y)\in V^3\) ,加入\(z\)後,原始問題課通過(1)表示:

\[P(Y|X) = P(Y,z|X) = P(Y|z,X)p(z|X) \tag{1} \]

同DDPM[1] ,也是去預測噪音\(\epsilon _t = f_\Phi (z_t,t,X)\)。 LDM 的損失函式為:

\[\mathcal{L}_{\text {lat }}(\mathbf{X}, \mathbf{Y})=\sum_{t=1}^{T} \underset{q\left(z_{t} \mid z_{0}\right)}{\mathbb{E}}\|f_{\Phi}\left(z_{t}, t, \mathbf{X}\right)-\underbrace{\mathcal{E}(\mathbf{Y})}_{z}\|_{1} \tag{2} \]

3.3. Behavioral latent diffusion


Fig. 3. 整體框架
Source:

整體模型圖如Fig3所示,影象的上半部分也就是3.2部分,產生行為潛在變數\(z\)。下半部分屬於真正的生成模型部分,結構類似於encoder-decoder模型。
\(\mathcal{B}_{\phi}\):行為耦合器(behavior counpler)
\(\mathcal{r}_w\):輔助decoder,用於幫助訓練\(z\)
\(\mathcal{B}_{\phi}\)\(\mathcal{r}_w\)交替訓練(類似於對抗訓練), \(\mathcal{r}_w\)對應的損失函式為(3),\(\mathcal{B}_{\phi}\)對應的損失函式為(4)

\[\max _{\omega} \mathcal{L}_{\text {aux }}=\max _{\omega} \mathbb{E}_{p_{\theta}\left(z \mid \mathbf{Y}_{e}\right)}\left(\log r_{\omega}\left(\mathbf{Y}_{e} \mid z\right)\right)\tag{3} \] \[\begin{aligned} \max _{\alpha, \theta, \phi} \mathcal{L}_{\text {main }}=\max _{\alpha, \theta, \phi} & \mathbb{E}_{p_{\theta}\left(z \mid \mathbf{Y}_{e}\right)}\left[\log \mathcal{B}_{\phi}\left(\mathbf{Y}_{e} \mid z, g_{\alpha}\left(\mathbf{x}_{m}\right)\right)\right] \\ & \left.-D_{\mathrm{KL}}\left(p_{\theta}\left(z \mid \mathbf{Y}_{e}\right) \| p(z)\right)\right)-\mathcal{L}_{\mathrm{aux}} \end{aligned} \tag{4} \]

4. Expereiment

  • database
    • Human3.6M
    • AMASS
  • Evaluation metrics
    • Average and the Final Displacement Error metrics(ADE,FDE)
    • MMADE
    • MMFDE
    • Average Pairwise Distance(APD):衡量多樣性
    • Frechet Inception Distance(FID)

總結

對行為標籤進行擴散模型建模,和我預想的結構差不多,加入行為標籤,對應的多樣性會體現在同一種行為標籤的動作序列當中。

Reference

[1] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.