The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

阿新 • • 發佈：2021-12-18

發表時間：2019
文章要點：文章分析了Dyna這種model based方法，用model去生成one-step的transition和n-step的transition的區別，得出的主要結論是one-step的transition基本上沒有任何幫助，還不如直接用現有的buffer多更新幾次網路（similar sample efficiency gains could be obtained simply by performing more model-free updates on the data the agent had already gathered）。要想有用，就要用model去生成unfamiliar的樣本。然後作者就通過model planning生成更長的軌跡，而不是隻生成一步的方式來說明這個方式是有效的（planning with longer rollouts yields dramatic improvement when using a perfect model.）。不過另一方面，作者也說了，用Dyna的模式一邊學model，一邊學value，效果很差，基本沒有提升（With a learned model, performance will likely be worse due to model errors,）。從下面的圖裡也能看出來，不管是預訓練學一個model，還是一邊做RL一邊學model，效果都不行

實驗具體設計就是對rollout長度做對比，作者做的實驗就是控制總的planning的step數量，然後改變軌跡長度，比如100x1表示100條長度為1的rollouts, 10x10表示10條長度為10的rollouts。然後model要麼直接提供perfect model，要麼用資料集先學一個fixed model，再或者就online去學model。
總結：

一個可以想象到的結論，不過也給我們啟示，要想效果好，就需要能探索並且得到多樣化的樣本。這其實也給我們指了一條路，就是去model裡面做更多的探索，這也可以保證RL更safe。另外，因為model error會使得效果更差，所以planning的時候需要判斷模型準不準，然後確定planning長度。
另外，planning 100的長度對於一個隨便就上千步一局的遊戲來說也並不長其實，而且和環境多互動幾次也能得到多樣性的樣本，這就解釋了為啥直接在環境裡執行互動次數+planning次數那麼多的step，其實得到的樣本多樣性並不比planning得到的差。
疑問：既然rollout長才有用，但是如果model是學的，那長的rollout誤差也會更大，所以這個不準確的model也不會帶來好處了。這麼看的話，planning很雞肋啊？
另外，即使是在perfect model上做很長的planning，效果還是比不過直接在環境裡學互動次數+planning次數那麼多的step，那既然這樣，如果不考慮和環境互動次數的限制的話，model和planning簡直一無是處啊。。。
關於對比DQN Extra Updates，Rollout-Dyna-DQN比DQN Extra Updates效果好，作者解釋是Rollout-Dyna-DQN的樣本更加多樣化，所以更好。我感覺是因為作者總共就只和環境互動了100K step的原因吧，如果大家都和環境互動個1e6，2e6，這個多樣性的差距應該就被抹平了吧？

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

發表時間：2019 文章要點：文章分析了Dyna這種model based方法，用model去生成one-step的transition和n-step的transition的區別，得出的主要結論是one-step的transition基本上沒有任何幫助，還不如直接用現有的buf

DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020.

技術標籤：pythonopencv 將mac預設的Python2.7改為Python3 鑑於mac預設的python環境為2.7，而pip在安裝的時候會提示Python 2.7 reached the end of its life on January 1st, 2020.，因此我們需要更改執行環

Selective Dyna-Style Planning Under Limited Model Capacity

發表時間：2020（ICML 2020）文章要點：這篇文章考慮的情形是，用imperfect model來planning的時候，由於model的誤差會導致planning不準，所以就需要有選擇性的planning，通過度量predictive uncertainty只在模型

【論文閱讀】Effects of Emotional Music on Facial Emotion Recognition in Children with Autism Spectrum Disorder (ASD)

1.這篇文章究竟講了什麼問題？研究情感一致(congrunent)音樂對患有自閉症兒童的面部情感識別能力的影響

ON THE ROLE OF PLANNING IN MODEL-BASED DEEP REINFORCEMENT LEARNING

發表時間：2021（ICLR 2021）文章要點：這篇文章想要分析model-based reinforcement learning (MBRL)裡面各個部分的作用。文章以muzero為基礎，回答了三個問題

[轉]Bulk Update the Value of a System Field on Jira Issues

本文轉自：https://library.adaptavist.com/entity/bulk-update-the-value-of-a-system-field-on-jira-issues

Towards the Memorization Effect of Neural Networks in Adversarial Training

目錄概主要內容 typcial 和 atypical 樣本 atypical 較差的泛化性 typcial 和 atypical 樣本在魯棒性上的衝突

Identify three possible adverse effects on an entity’s financial statements arising from recognition of a lease arrangement on the statement of financial position.

A note on the calculation of some functions in finite fields: Tricks of the Trade解讀

本節對該paper進行解讀，記錄筆記。經常見到的是在素域\\(F_p\\)上計算的，尤其是雙線性對出現後，在擴域\\(F_{p^m}\\)上計效率就需要優化了。該論文主要總結了一些在有限域上進行某些計算（求模逆，hash到curve的

visual studio (window10) dark主題下修改游標粗細（visual studio change the thickness of the cursor in dark theme for window10）

本人電腦配置：window10系統， Microsoft Visual Studio 2019 本來在visual studio中設定了 dark 的主題，想說使電腦亮度小點，但是發現游標強度太小，經常看不到，既浪費了尋找游標的時間，又不利於眼睛，所以上網

01MySQL核心分析-The Skeleton of the Server Code

摘要這個官方檔案一段對MySQL核心分析的一個嚮導。是對MySQL一條insert語句寫入到MySQL資料庫的分析。

What is the benefit of developing the application as a windows service?

What is the benefit of developing the application as a windows service? On the top of my head: You can control the user (and the rights associated with this user account) which starts the process

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文總結

VM-FT 論文總結說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人總結，難免有理解不到位之處，歡迎交流與指正。

《The Design of a Practical System for Fault-Tolerant Virtual Machines》論文研讀

VM-FT 論文研讀說明：本文為論文《The Design of a Practical System for Fault-Tolerant Virtual Machines》的個人理解，難免有理解不到位之處，歡迎交流與指正。

題解-The Number of Good Intervals

題面 The Number of Good Intervals 給定 \\(n\\) 和 \\(a_i(1\\le i\\le n)\\)，\\(m\\) 和 \\(b_j(1\\le j\\le m)\\)，求對於每個 \\(j\\)，\\(a_i\\) 區間 \\(\\gcd\\) 為 \\(b_j\\) 的區間數。

CppCon筆記--Back to Basics: RAII and the Rule of Zero

1.RAII 和 rule of three C++程式設計很多時候需要手動管理資源，其中包括資源的獲取，使用和釋放，而手動對資源釋放是很容易出錯的一個環節。

The Tower of Babylon

Perhaps you have heard of the legend of the Tower of Babylon. Nowadays many details of this tale have been forgotten. So now, in line with the educational nature of this contest, we will tell you the

題解 CF622F 【The Sum of the k-th Powers】

題目連結 Solution CF622F The Sum of the k-th Powers 題目大意：給定\\(i,k\\),求\\(\\sum_{i=1}^ni^k\\)

hdu2444 The Accomodation of Students

http://acm.hdu.edu.cn/showproblem.php?pid=2444 Problem Description There are a group of students. Some of them may know each other, while others don\'t. For example, A and B know each other, B and C k

1535. Find the Winner of an Array Game

Given an integer arrayarrofdistinctintegers and an integerk. A game will be played between the first two elements of the array (i.e.arr[0]andarr[1]). In each round of the game, we comparearr[0]witharr

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

相關推薦