Model-Based Reinforcement Learning via Latent-Space Collocation

阿新 • • 發佈：2022-03-10

發表時間：2021（ICML 2021）
文章要點：這篇文章提出了latent collocation method (LatCo)演算法，用來planning狀態序列，而不是動作序列，來解決long horizon的planning問題（it is easier to solve long-horizon tasks by planning sequences of states rather than just actions）。主要思路就是先找到reward高的狀態，然後再找到達那個狀態對應的動作序列（we turn to the technique of collocation, which optimizes a sequence of states to maximize the reward, while also eventually ensuring dynamics feasibility by recovering the corresponding actions）。
具體的，優化目標為

先學一個latent state space models把觀測轉換成隱狀態，然後基於隱狀態來學狀態轉移和reward函式。然後優化目標就變為

把dynamics model和action的約束加上，最後的式子變成

為了解這個優化問題，文章還利用Levenberg-Marquardt optimization來提高訓練速度（This efficient optimizer converges 10-100 times faster than gradient descent in wall clock in our experiments.）。

總結：

很有想法的一篇文章啊，通過優化狀態軌跡來planning，先不去管動作，先找到reward高的狀態，然後再回過頭來找動作，確實挺有意思。這個方式有點像先找到state作為subgoal，然後再找動作序列。不過這個過程是在latent space上面做的，是不是就和model的準確性有很大關係了。還有，好像沒有看到RL的演算法在裡面，主要就是學model，然後planning。或者可以把planning看成是RL，因為這裡面planning也是用優化方法做梯度更新的方式來做的。
疑問：Levenberg-Marquardt optimization不懂。
裡面很多式子變來變去的，比如幾個優化目標的轉換過程，不是很理解。
最後是怎麼形成一條可行的動作序列到達那個狀態的其實不是很理解，加個約束項真的夠了嗎？會不會有失敗的情況？
這文章感覺挺難的，還沒想明白具體是怎麼work的。

Model-Based Reinforcement Learning via Latent-Space Collocation

Model-Based Reinforcement Learning via Latent-Space Collocation

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Model-based Reinforcement Learning: A Survey

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

【論文閱讀】End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

ON THE ROLE OF PLANNING IN MODEL-BASED DEEP REINFORCEMENT LEARNING

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Online and Offline Reinforcement Learning by Planning with a Learned Model

論文解讀：COLING-2020(ccf-b)-Answer-driven Deep Question Generation based on Reinforcement Learning

Model-free Deep Reinforcement Learning for Urban Autonomous Driving

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

【論文筆記】《FLchain: Federated Learning via MEC-enabled Blockchain Network》精讀筆記

讀書筆記-增量學習-A CNN-Based Broad Learning System

Reinforcement Learning (DQN) 中經驗池詳細解釋

論文記載： Deep Reinforcement Learning for Traffic LightControl in Vehicular Networks

MFMARL(Mean Field Multi-Agent Reinforcement Learning)實現

強化學習論文研讀（四）——Deep Reinforcement Learning with Double Q-Learning

讀論文--Characterizing Attacks on Deep Reinforcement Learning

Model-Based Reinforcement Learning via Latent-Space Collocation

相關推薦