REPAINT: Knowledge Transfer in Deep Reinforcement Learning

阿新 • • 發佈：2021-10-30

發表時間：2021（ICML 2021）
文章要點：這篇文章提出了一個叫REPresentation And INstance Transfer (REPAINT)的演算法來做RL裡的知識遷移。主要方法就是representation transfer和instance transfer。這個representation transfer就是用一個cross entropy來約束teacher policy和\(\pi_θ\)的距離，讓他倆接近

訓的時候就和PPO的loss合起來就完了

然後這個instance transfer就是說從teacher policy收集的軌跡裡面也會有在現在這個task上表現好的軌跡，然後設一個閾值把好的拿出來用來訓練\(\pi_θ\)

，作者把這個叫advantage-based experience selection，然後用到PPO裡面的時候那個ratio就不除以\(\pi_{\theta_{old}}\)了，而是除以teacher policy

然後更新就是把這兩個loss合起來就完了，參見上面的演算法虛擬碼。
總結：實驗做了很多，但是從各種引數看來，trick不會少。另外，感覺理論部分的假設有點太多了，而且還沒寫到正文裡面，可能就是實驗為主吧。
疑問：這裡除以teacher policy的操作修正了權重，但是取樣的軌跡是用了threshold之後的，那這個分佈也不是\(\pi_{teacher}\)的了啊，這麼做沒有問題嗎？
文章裡面還專門說了句In addition, our method performs policy update without importance sampling.這個ratio \(\rho_\theta\)

不就是importance sampling嗎？

REPAINT: Knowledge Transfer in Deep Reinforcement Learning

REPAINT: Knowledge Transfer in Deep Reinforcement Learning

論文記載： Deep Reinforcement Learning for Traffic LightControl in Vehicular Networks

ON THE ROLE OF PLANNING IN MODEL-BASED DEEP REINFORCEMENT LEARNING

強化學習論文研讀（四）——Deep Reinforcement Learning with Double Q-Learning

讀論文--Characterizing Attacks on Deep Reinforcement Learning

Detecting Rewards Deterioration in Episodic Reinforcement Learning

Diagnosing Bottlenecks in Deep Q-learning Algorithms

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Encoding Human Domain Knowledge to Warm Start Reinforcement Learning

TREEQN AND ATREEC: DIFFERENTIABLE TREE-STRUCTURED MODELS FOR DEEP REINFORCEMENT LEARNING

Model-free Deep Reinforcement Learning for Urban Autonomous Driving

lec-1-Deep Reinforcement Learning, Decision Making, and Control

Decoupling Value and Policy for Generalization in Reinforcement Learning

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Improving Generalization in Reinforcement Learning with Mixture Regularization

論文解讀：COLING-2020(ccf-b)-Answer-driven Deep Question Generation based on Reinforcement Learning

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

深度學習論文翻譯解析（九）：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Batch-Constrained deep Q- Learning(BCQ)

Deep Residual Learning for Image Recognition 筆記

REPAINT: Knowledge Transfer in Deep Reinforcement Learning

相關推薦