Learning Latent Dynamics for Planning from Pixels

阿新 • • 發佈：2021-11-28

發表時間：2019（ICML 2019）
文章要點：文章提出了一個叫Deep Planning Network (PlaNet)的模型來學習環境的dynamics，然後用online planning的方式在這個模型構建的latent space上進行planning得到action。這裡面的關鍵就是model要能夠準確預測多步的reward（the dynamics model must accurately predict the rewards ahead for multiple time steps），作者的方法同時考慮了dynamics的確定性和隨機轉移，並將目標函式轉換成一個multi-step variational inference objective進行訓練，取得了很好的效果。
具體的，這個PlaNet包含四個model，Transition model，Observation model，Reward model以及Encoder

有了這個model之後，policy就直接用online planning的方式去search得到，就不用網路去學了。作者用的是model-predictive control，具體來說作者用的是cross entropy method，CEM。
至於訓練模型的資料，就是在planning得到的動作基礎上加個噪聲來收集資料。至於訓練，作者用了一個叫recurrent state-space model (RSSM)的模型，之前所有的四個model Transition model，Observation model，Reward model以及Encoder都假設是高斯，encoder就是根據以前的所有觀測和動作來預測狀態，注意這個encoder是基於觀測的，而transition是基於狀態的。訓練的目標函式為

然後作者為了把transition分成確定和不確定，還把Transition model分成了兩個部分

這裡h表示確定性的轉移，s就表示基於h和隨機轉移產生的狀態。
然後目前這個目標函式值考慮了一步的擬合，作者進一步考慮多步的擬合，也就是根據前面d步的狀態，學習下一個狀態

目標函式對應變成

然後再把latent overshooting加進來，這裡latent overshooting我感覺就是之前的訓練只考慮了d步，現在我要考慮任意步數

總結：

感覺這個文章很晦澀，說了很多，但是又好像什麼都沒說，式子裡面也沒有完全體現出各個部分是怎麼訓練的，可能還是我太菜了啊，唉。
疑問：作者把這個訓練的目標叫做latent overshooting，這個overshooting咋理解，就是考慮多步嗎？
Contact dynamics是啥？
cross entropy method，CEM之前看過又忘了，還要再看看。
filtering posterior是啥，不懂。可能需要看看Kalman filter。
雖然目標函式看來是沒啥問題，但是具體寫到網路裡是長啥樣的，輸入輸出啥的還是不清楚，只能看程式碼了。
證明沒有看。

Learning Latent Dynamics for Planning from Pixels

發表時間：2019（ICML 2019）文章要點：文章提出了一個叫Deep Planning Network (PlaNet)的模型來學習環境的dynamics，然後用online planning的方式在這個模型構建的latent space上進行planning得到action。這裡面

[ECCV 2020] DeepGMR: Learning Latent Gaussian Mixture Models for Registration

零、概要論文: DeepGMR: Learning Latent Gaussian Mixture Models for Registrationtag: ECCV 2020; Registration程式碼: https://github.com/wentaoyuan/deepgmr作者: Wentao Yuan, Ben Eckar, Kihwan K

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

發表時間：2018（ICRA 2018）文章要點：這篇文章提出了一個叫model-based and model-free (Mb-Mf)的演算法，先用model based的方法訓一個policy，再用model free的方法來fine tune。具體的，先學一個model，然後用pl

論文筆記+模型實現TransNets: Learning to Transform for Recommendation

文章目錄摘要1. 介紹2. 提出的方法2.1 CNN處理文字 & 2.2 DeepCoNN模型2.3 DeepCoNN的一下侷限性2.4 TransNets模型2.5 TransNets模型的訓練2.6 設計決策和一些其他結構的選擇2.6.1 分步訓練 VS. 合併

《論文閱讀》FPConv: Learning Local Flattening for Point Convolution

留個筆記自用 FPConv: Learning Local Flattening for Point Convolution 做什麼 Point Cloud Based Semantic Analysis基於點雲的語義分析，語義分割（ Semantic segmentation）需要預測出輸入影象的每一個

LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

發表時間：2021（ICLR 2021）文章要點：這篇文章想說，在state裡面其實有很多工無關的東西，如果用Reconstruction之類的方式去做就還是會考慮這些東西，作者提出用Bisimulation metric去做representation，讓late

CoMic: Complementary Task Learning & Mimicry for Reusable Skills

發表時間：2020（ICML 2020）文章要點：這篇文章想要先學low-level policy，然後基於low-level policy來訓練一些強化的任務，這樣就相當於可以用reusable skills來加快下游任務的訓練。主要思路是通過模仿學習和聯

半監督-Learning Discrete Structures for Graph Neural Networks

標籤：圖神經網路動機圖神經網路主要優點是能夠在資料點之間結合稀疏和離散的依賴關係, 但是, 圖神經網路也只能在這樣的圖結構進行使用, 而在真實的世界中的圖通常是帶有噪聲和不完整的, 或者根本不可用的

Barquero-2022-BeLFusion Latent Diffusion for Behavior-Driven Human Motion Prediction

# BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction #paper 1. paper-info 1.1 Metadata

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text 2021-07-22 08:54:20

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

發表時間：2020（ICML 2020）文章要點：這篇文章想說model based方法在data efficiency和planning方面都具有天然優勢，但是model的泛化性通常是個問題。這篇文章提出學一個context相關的latent vector，然後用mod

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation 論文解讀（SIGMOD 2021 UAE）

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation 論文解讀（SIGMOD 2021）

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing 2021 CVPR　　L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing (thecvf.com)

報錯--->java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'delect from testd

報錯--->java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'delect from testdb.

Learning Latent Dynamics for Planning from Pixels

Learning Latent Dynamics for Planning from Pixels

[ECCV 2020] DeepGMR: Learning Latent Gaussian Mixture Models for Registration

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

論文筆記+模型實現TransNets: Learning to Transform for Recommendation

《論文閱讀》FPConv: Learning Local Flattening for Point Convolution

LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

CoMic: Complementary Task Learning & Mimicry for Reusable Skills

半監督-Learning Discrete Structures for Graph Neural Networks

Barquero-2022-BeLFusion Latent Diffusion for Behavior-Driven Human Motion Prediction

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation 論文解讀（SIGMOD 2021 UAE）

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation 論文解讀（SIGMOD 2021）

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

報錯--->java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'delect from testd

Use SQL to Query Data from CDS and Dynamics 365 CE

Best practices for upgrading Amazon RDS for Oracle DB instances from 11.2.0.4 to 19c

[Machine Learning] Octave Control Statements, for while if

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation 論文筆記

論文閱讀筆記《Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning》

Learning Latent Dynamics for Planning from Pixels

相關推薦