CS294-112 深度強化學習 秋季學期(伯克利)NO.9 Learning policies by imitating optimal controllers
阿新 • • 發佈:2018-05-28
image TP 分享圖片 BE http com bubuko cos .com
make compromise between learnt policy and minimal cost!
π hat is using states
π theta is using observations
CS294-112 深度強化學習 秋季學期(伯克利)NO.9 Learning policies by imitating optimal controllers