LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

阿新 • • 發佈：2021-11-01

發表時間：2021（ICLR 2021）
文章要點：這篇文章想說，在state裡面其實有很多工無關的東西，如果用Reconstruction之類的方式去做就還是會考慮這些東西，作者提出用Bisimulation metric去做representation，讓latent space裡面狀態的距離等於Bisimulation metric。具體來說，Bisimulation metric的思路就是兩個state的距離應該是reward的差和狀態轉移的差，所以就不考慮其他東西，就自然不會包含任務無關的表徵。具體的，Bisimulation metric定義為

具體到演算法，就是先有一個encoder把state弄到latent space，然後基於這個latent space的狀態來訓強化，比如SAC。這個Bisimulation metric就用來訓encoder，具體到演算法loss變成

這裡z就是經過encoder後在latent space裡的狀態，r就是reward，P就是狀態轉移，如果是隨機轉移，就考慮高斯分佈。這裡的意思就是說我從buffer裡找兩個狀態，然後使這兩個狀態的距離就和r，P的差距一樣。
其實這裡面的dynamic P也是要訓的，其實就相當於是個model based方法，這樣J裡面的P才能算得出來。
總結：

總的來說就是提出了一個表徵的方式，不考慮任務無關的東西，提高穩定性和泛化性。想法make sense，不過要連訓三個東西，估計不太好訓啊。另外裡面寫了好幾個theorem，感覺和實驗關係不大。
疑問：如果我用目標檢測，語義分割，例項分割之類的技術來直接排除無關的object，會不會更直接？還是說有的東西不好判斷是不是無關，所以不好做？
如果在train一個encoder的時候，某個物體被認為是無關的，然後在test environment上，這個物體其實是相關的，這樣的話這個encoder的泛化性是不是直接無了？

LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

Multiscale Dynamic Coding improved Spiking Actor Network for Reinforcement Learning

論文記載： Deep Reinforcement Learning for Traffic LightControl in Vehicular Networks

Learning User Representations with Hypercuboids for Recommender Systems

Decoupling Value and Policy for Generalization in Reinforcement Learning

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

【論文閱讀】End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

TREEQN AND ATREEC: DIFFERENTIABLE TREE-STRUCTURED MODELS FOR DEEP REINFORCEMENT LEARNING

Model-free Deep Reinforcement Learning for Urban Autonomous Driving

Smooth Exploration for Robotic Reinforcement Learning

《The Matrix Calculus You Need For Deep Learning》讀書筆記

A Little Is Enough: Circumventing Defenses For Distributed Learning（繞過對分散式學習的防禦）

論文筆記+模型實現TransNets: Learning to Transform for Recommendation

《論文閱讀》FPConv: Learning Local Flattening for Point Convolution

Reinforcement Learning (DQN) 中經驗池詳細解釋

MFMARL(Mean Field Multi-Agent Reinforcement Learning)實現

強化學習論文研讀（四）——Deep Reinforcement Learning with Double Q-Learning

LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION

相關推薦