Deep RL Bootcamp Lecture 4A: Policy Gradients

阿新 • • 發佈：2018-04-30

spec incr any 9.png eal match sim AD tween

技術分享圖片

in policy gradient, "a" is replaced by "u" usually.

技術分享圖片

use this new form to estimate how good the update is.

技術分享圖片

If all three path show positive reward, should the policy increase the posibility of all the sampling?

技術分享圖片

monte carlo estimate

技術分享圖片

TD estimate

技術分享圖片

2 weeks to train as respect to real world time scale.

but could be faster in emulator (MOJOCO).

we don‘t know whether a set of hyperparameter is going to work until enough interations have past. So it‘s kind of tricky, and using emulator could alleviate this problem.

question: how to transform learnt knowledge of robot to real life if we are not sure about the match between simulator and real world?

Randomly initilize many simulator and see the robustness of the algorithm

技術分享圖片

this video shows that even a robot with two years of endeavor of a group of experts still isn‘t good at locomotion

hindsight experience replay

Marcin Richard from OpenAI

the program is set to find the best way to get pizza, but when the agent find a ice cream, the agent realizes that ice cream, corresponding to a higher reward, is the exact thing it wants to get.

Deep RL Bootcamp Lecture 4A: Policy Gradients

spec incr any 9.png eal match sim AD tween in policy gradient, "a" is replaced by "u" usua

Deep RL Bootcamp Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 3: Deep Q-Networks

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp Lecture 8 Derivative Free Methods

2017 Fall CS294 Lecture 4: Policy gradients introduction

Deep Q-learning and Policy Gradients ( towards AGI ).

CS294-112 深度強化學習秋季學期（伯克利）NO.4 Policy gradients introduction

深度強化學習cs294 Lecture8: Deep RL with Q-Function

深度強化學習cs294 Lecture5: Policy Gradients Introduction

機器學習技法筆記-Lecture 13 Deep learning

Lecture 13：Deep Learning

CS231n筆記 Lecture 8, Deep Learning Software

CMU Deep Learning 2018 by Bhiksha Raj 學習記錄(20) Lecture 20: Hopfield Networks 1

RL — Proximal Policy Optimization (PPO) Explained

2018 10-708 (CMU) Probabilistic Graphical Models {Lecture 21} [A Hybrid: Deep Learning and Graphical Models]

PyTorch Lecture 07: Wide and Deep

(轉) Learning Deep Learning with Keras

【論文閱讀-REC】<<Recommending music on Spotify with deep learing>>閱讀

[3 Jun 2015 ~ 9 Jun 2015] Deep Learning in arxiv

Deep RL Bootcamp Lecture 4A: Policy Gradients

相關推薦