LEARNING WITH AMIGO: ADVERSARIALLY MOTIVATED INTRINSIC GOALS

阿新 • • 發佈：2021-11-01

發表時間：2021（ICLR 2021）
文章要點：這篇文章提出了一個解決sparse extrinsic rewards的辦法AMIGO。思路就是用一個goal-generating teacher來生成難度適中的任務目標(constructively adversarial objective)，提供一個目標相關的外部reward，讓goal-conditioned student policy來學。具體來說，student policy就是通常的強化，paper裡用的IMPALA，只是reward變成了

這裡\(r_t^e\)就是環境給的外部reward，\(r_t^g\)就是teacher給的目標reward

這意思就是說到達目標了就給1，否則就是0.
而goal-generating teacher也是用強化訓的，他的policy就是輸出一個不要太簡單但是也不太難的任務(propose goals that are not too easy for the student to achieve, but not impossible either).具體實現也很簡單，就是設一個閾值\(t^*\)

，如果student policy完成了任務並且所需的step大於\(t^*\)，就給一個正的reward給teacher，如果student policy完不成或者完成了但是step小於\(t^*\)，就給負的reward

然後整個訓練過程這個閾值慢慢增大，就相當於goal慢慢變難(Specifically, the threshold is increased by 1 whenever the student successfully reaches an intrinsic goal in more than \(t^*\) steps for ten times in a row.)。然後就結束了。
看到這裡基本上就知道了，首先這個teacher輸出的goal是座標位置以及位置上的東西，因為是在迷宮任務上做的。這個設定其實就說明的這個方法沒法通用，只能自己具體問題具體設計。另外看teacher reward的設計，以及閾值\(t^*\)

，也知道這裡面有多少trick，有多少調參了。
總結：總的來說沒意思，雖然是ICLR的paper，而且還是MIT做的，感覺還是有點水了。Trick有點多，不夠通用，需要瘋狂調參。
疑問：文章裡面強調了兩次在6個任務上一共做了114個實驗，這實驗數量也能拿來吹了嗎？

LEARNING WITH AMIGO: ADVERSARIALLY MOTIVATED INTRINSIC GOALS

LEARNING WITH AMIGO: ADVERSARIALLY MOTIVATED INTRINSIC GOALS

Deep Learning with pytorch筆記（第三章）

【論文筆記（5）ECCV2020】Graph convolutional networks for learning with few clean and many noisy labels

機器學習類條件隨機標籤噪聲情況下的二分類問題研究復現NIPS論文learning with noisy label（logistic & C-SVM）

強化學習論文研讀（四）——Deep Reinforcement Learning with Double Q-Learning

DIVIDEMIX: LEARNING WITH NOISY LABELS AS SEMI-SUPERVISED LEARNING

FetchSGD: Communication-Efficient Federated Learning with Sketching

Improving Generalization in Reinforcement Learning with Mixture Regularization

[ PyTorch ] Deep Learning with PyTorch: A 60 minute Blitz（閃電戰）| TENSORS

論文筆記(9)-"Personalized Federated Learning with Gaussian Processes"

【流行前沿】聯邦學習 Federated Learning with Only Positive Labels

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

【論文考古】量化SGD Federated Learning with Quantization Constraints

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

【論文閱讀】Spontaneous facial expression database of learners’ academic emotions in online learning with hand occlusion

FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation閱讀筆記

Reinforcement Learning with Long Short-Term Memory

論文解讀（DiffPool）《Hierarchical Graph Representation Learning with Differentiable Pooling》

【原創】【論文閱讀】2020 Learning From Noisy Large-Scale Datasets With Minimal Supervision

Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image【使用單張影象進行自監督學習去噪】

LEARNING WITH AMIGO: ADVERSARIALLY MOTIVATED INTRINSIC GOALS

相關推薦