MBMF: Model-Based Priors for Model-Free Reinforcement Learning

阿新 • • 發佈：2022-03-13

發表時間：2017
文章要點：這篇文章提出了一個Model-Based Model-Free (MBMF)演算法，通過學習一個dynamics model然後作為先驗來做model free optimization，這裡的model free optimization指的是基於Gaussian Process (GP) 的Bayesian Optimization (BO)。
具體的，如果dynamics model是未知的，就先學一個

有了這個之後，把策略看成一個引數搜尋問題，用GP去搜策略。每次搜到一個策略，就可以去計算目標函式，這裡定義的目標函式是最小化cost，用\(l\)表示每個step的cost，式子如下

這個在真實環境裡面得到的J用來擬合一個\(J\)

的估計\(\hat{J}\),基於這個估計進而用來更新BO的acquisition function

這裡\(\theta\)表示一組policy的引數，\(J\)就是估計的cost函式，所以acquisition function就是基於policy和cost計算的一個用來選下一組policy引數\(\theta\)的一個函式。
有了新的policy的引數之後，就可以拿到環境裡面去測試，得到狀態轉移以及cost。狀態轉移就用來更新model，cost就用來更新\(\hat{J}\)。然後迴圈下去。
總結：model based加BO的方式來做，還挺有意思的。
疑問：演算法虛擬碼裡面學了的dynamics model好像沒用上啊，不知道哪裡理解錯了。還是說，dynamics model就是用來算J的，然後BO是基於dynamics model做的？

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

【論文閱讀】End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances

Decoupling Value and Policy for Generalization in Reinforcement Learning

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Model-free Deep Reinforcement Learning for Urban Autonomous Driving

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

ON THE ROLE OF PLANNING IN MODEL-BASED DEEP REINFORCEMENT LEARNING

Model-based Reinforcement Learning: A Survey

Model-Based Reinforcement Learning via Latent-Space Collocation

django.core.exceptions.ImproperlyConfigured: Field name `tester_id` is not valid for model `WebCase`.

Online and Offline Reinforcement Learning by Planning with a Learned Model

MOPO: Model-based Offline Policy Optimization

A knowledge representation model based on the geographic spatiotemporal process

筆記：A Novel Global Feature-Oriented Relational Triple Extraction Model based on Table Filling

論文閱讀筆記《A semi-supervised CNN based method for steel surface defect recognition》

論文記載： Deep Reinforcement Learning for Traffic LightControl in Vehicular Networks

特斯拉官方微博：新款 Model S 和新款 Model X 正式上線

Ontology-based services for software vulnerability detection: a survey

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

相關推薦