Visualizing and Understanding Atari Agents

阿新 • • 發佈：2021-10-17

發表時間：2018（ICML2018）
文章要點：這篇文章用saliency map的方式來做RL agent的視覺化，目的就是想分析一下agent做某個動作的時候到底關注的是輸入的哪個部分，這樣可以對RL的agent有一定的可解釋性。比如下圖這個效果

其中藍色是policy關注的點，紅色是value關注的點，可以看出來關注的點還挺正確的，效果還不錯。方法的思路就是通過給畫素新增擾動，來看哪個部分的影響最大，比如breakout這個遊戲，我們新增擾動把小球蓋住，結果導致policy發生了變換，那就說明小球很重要，policy在關注小球這個位置。具體的，擾動的新增方式為

這裡\(i,j\)表示新增噪聲的位置，大寫的\(I_t\)

表示在t時刻的影象，\(M(i,j)\)是以位置\(i,j\)為中心的一個二維高斯分佈，其中\(\sigma^2=25\)。\(A\)表示新增的噪聲，其中標準差為\(\sigma_A\)。一個圓一個點那個符號就是兩個矩陣對應位置相乘。所以這個式子的第一項就是根據\(M(i,j)\)的概率分佈保留原始圖片的資訊，很顯然越靠近\((i,j)\)保留的原始資訊就越少。然後第二項就是根據\(M(i,j)\)的概率分佈來新增擾動\(A\)，很顯然越靠近\((i,j)\)新增的擾動就越大。
接下來就是如何評估對policy和value的影響。對policy，作者取policy前面那層logits的輸出然後和真實值做差的平方，value就直接對value做差的平方。式子如下

然後就可以畫出來之前那個saliency map了。
總結：

挺有意思的工作，從結果上看確實是看出來了關注的點。不過每次畫一張圖，都需要分別在每個畫素上新增擾動，想想這個計算量還是很大了。
疑問：不知道這個新增擾動的方式通不通用，會不會換個環境就需要調參了，比如裡面兩個標準差的選擇之類的。

Visualizing and Understanding Atari Agents

Visualizing and Understanding Atari Agents

深度學習論文翻譯解析（十）：Visualizing and Understanding Convolutional Networks

Visualizing and Understanding Convolutional Networks論文復現筆記

DOC - Using and understanding OpenMesh

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Understanding and Improving Fast Adversarial Training

iOS Jailbreak Principles 0x02 - codesign and amfid bypass

動手實現MySQL讀寫分離and故障轉移

numpy.array shape (R, 1) and (R,) 的區別

LeetCode 841：鑰匙和房間 Keys and Rooms

Sentinel Getting Started And Integration of Spring Cloud Alibaba Tutorials

Joins in SQL - Inner, Outer, Left and Right

解決大於5.7版本mysql的分組報錯Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated

SQL語句中OR和AND的混合使用的小技巧

關於SQL語句中的AND和OR執行順序遇到的問題

MongoDb的"not master and slaveok=false"錯誤及解決方法

Django Form and ModelForm的區別與使用

資料庫學習之MySQL (七）——模糊查詢萬用字元 like ‘between and’ in ‘is null’ 安全等於

解決IDEA連線mysql報錯：Server returns invalid timezone. Go to 'Advanced' tab and set 'server

解決Python 異常TypeError: cannot concatenate 'str' and 'int' objects

Visualizing and Understanding Atari Agents

相關推薦