DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

阿新 • • 發佈：2021-11-28

發表時間：2020（ICLR 2020）
文章要點：文章提出一個叫Dreamer的演算法，就是去學一個world model，然後強化學習在compact state space上進行。就相當於所有的學習過程都不是和真正的環境互動學習，而是在world model上進行，所以把這個東西叫做Dreamer，相當於夢裡學習，夢裡啥都有。
Model包含三個部分，Representation model，Transition model和Reward model

這裡的s指的不是真實的狀態，而是compact state space的狀態。去學這幾個model，作者提了三種常用的方法，第一種是Reward prediction，就是直接整個model合到一起，目標就是去學reward。
第二種是Reconstruction，目標就是把image重建出來，

這個方法通常用the variational lower bound (ELBO)或者the variational information bottleneck (VIB)算一個bound，然後優化這個bound

這個部分應該和《Learning Latent Dynamics for Planning from Pixels》一樣。
第三種是Contrastive estimation，就是用一個state model去從觀測預測狀態

相當於是對比一下observation和state的區別，比如用noise contrastive estimation (NCE)去學。
然後在這個model的基礎上去學強化，文章用的actor critic方法，所以包括policy和value

然後整個演算法虛擬碼如下

總結：

感覺這個文章主要就是想說整個學習過程都弄到world model上去，儘量減少和真實environment的互動。文章裡面說了很多怎麼學model的方式，感覺還挺複雜的。從最後結果上來看，至少在連續控制的問題上，互動數量變成了1e6的量級，而model free的方法用了1e8，1e9，所以sample efficiency是有顯著提升的。不過DISCRETE CONTROL上面，比如Atari，還是沒有明顯優勢。
疑問：ontact dynamics到底是啥，這裡又出現了。
noise contrastive estimation (NCE)沒看過。

DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

發表時間：2020（ICLR 2020）文章要點：文章提出一個叫Dreamer的演算法，就是去學一個world model，然後強化學習在compact state space上進行。就相當於所有的學習過程都不是和真正的環境互動學習，而是在world m

VMware ESX 3.5 upgrade to vSphere 4.0 by vSphere Host Update Utility

VMware ESX 3.5 upgrade to vSphere 4.0 by vSphere Host Update Utility Wrote by Brian.Li 14-3-2011 Preparation:

Introduction to Machine Learning

2019獨角獸企業重金招聘Python工程師標準>>> 1:Introduction To Machine Learning In data science, we\'re often trying to understand a process or system using observational data.

K8s 1.20x版本nfs動態儲存報錯 persistentvolume-controller waiting for a volume to be created, either by external provisioner "qgg-nfs-storage" or manually created by system administrator

檢視報錯 [root@k8s-matser01 nfs.rbac]# kubectl get pvc NAMESTATUSVOLUMECAPACITYACCESS MODESSTORAGECLASSAGE

k8s建立StorageClass資源掛載報錯：waiting for a volume to be created, either by external provisioner "wangzy-nfs-storage" or manually created by system administrator

背景：建立StorageClass儲存類資源時，掛載pvc時一直顯示pending狀態報錯資訊：

lec-4-Introduction to Reinforcement Learning

模仿學習imitation learning與RL的不同模仿學習中需要有專家指導的資訊 RL不需要訪問專家資訊

[論文理解] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

Bootstrap Your Own Latent A New Approach to Self-Supervised Learning Intro 文章提出一種不需要負樣本來做自監督學習的方法，提出交替更新假說解釋EMA方式更新target network防止collapse的原因，同時用梯度解釋

《Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests》論文筆記

1. 摘要　　儘管深度神經網路(DNNs)在視訊異常檢測(VAD)方面取得了很大的進展，但現有的解決方案通常存在兩個問題：

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing 2021 CVPR　　L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing (thecvf.com)

From CSV to SQLite3 by python 匯入csv到sqlite例項

初次使用SQLite，嘗試把之前一個csv檔案導進去，看了網上各種教程，大多是在SQLite shell模式下使用的，比較麻煩，

Jmeter報錯“Failed to write core dump. Minidumps are not enabled by default on client versions of Windows”

最近在新電腦上安裝jmeter，開啟無報錯，但一執行測試用例就閃退，報錯“Failed to write core dump. Minidumps are not enabled by default on client versions of Windows”

sonar-scanner報錯Caused by: net.sourceforge.pmd.lang.java.ast.ParseException: Line 88, Column 37: Cannot use the diamond generic notation when running in JDK inferior to 1.7 mode!

原因是沒有加jdk的編譯版本如果是在命令列執行，則需要加-Dsonar.java.source=11 如果使用到了sonar-project.properties, 則加入sonar.java.source=11即可

make: *** No rule to make target 'build', needed by 'default'. Stop.

沒有安裝依賴包：安裝命令： 1、GCC——GNU編譯器集合（GCC可以使用預設包管理器的倉庫（repositories）來安裝，包管理器的選擇依賴於你使用的Linux釋出版本，包管理器有不同的實現：yum是基於Red

[Machine Learning] Octave Control Statements, for while if

For: v = zeros(10, 1); for i=1:10, v(i) = 2^i; end; # the same as indices=1:10 for i=indices, disp(i) end; while & if & break:

Spring配置日誌級別報紅：Caused by: org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'logging.level' to java.util.Map<java.lang.String

在SpringBoot2.x.x版本之後，在application.yml配置檔案中配置了修改預設logging.level（info）如下：

debian_linux系統_訪問真實環境rancher_證書問題相關_https相關_使用kubectl命令列檢視資源時報錯：Unable to connect to the server: x509: certificate signed by unknown authority

　　前言：近日在windows10上使用debian_linux虛擬系統使用kubectl命令列工具，訪問真實環境rancher時，無法訪問資源，丟擲異常：Unable to connect to the server: x509: certificate signed by unknown authority。

DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

VMware ESX 3.5 upgrade to vSphere 4.0 by vSphere Host Update Utility

Introduction to Machine Learning

K8s 1.20x版本nfs動態儲存報錯 persistentvolume-controller waiting for a volume to be created, either by external provisioner "qgg-nfs-storage" or manually created by system administrator

k8s建立StorageClass資源掛載報錯：waiting for a volume to be created, either by external provisioner "wangzy-nfs-storage" or manually created by system administrator

lec-4-Introduction to Reinforcement Learning

[論文理解] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

《Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests》論文筆記

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

From CSV to SQLite3 by python 匯入csv到sqlite例項

Jmeter報錯“Failed to write core dump. Minidumps are not enabled by default on client versions of Windows”

sonar-scanner報錯Caused by: net.sourceforge.pmd.lang.java.ast.ParseException: Line 88, Column 37: Cannot use the diamond generic notation when running in JDK inferior to 1.7 mode!

make: *** No rule to make target 'build', needed by 'default'. Stop.

[Machine Learning] Octave Control Statements, for while if

Spring配置日誌級別報紅：Caused by: org.springframework.boot.context.properties.bind.BindException: Failed to bind properties under 'logging.level' to java.util.Map<java.lang.String

debian_linux系統_訪問真實環境rancher_證書問題相關_https相關_使用kubectl命令列檢視資源時報錯：Unable to connect to the server: x509: certificate signed by unknown authority

總結Learning Efficient Single-stage Pedestrian Detectors by Asymptotic Localization Fitting

How to host winform control in IE

解決Your local changes would be overwritten by merge. Commit, stash or revert them to proceed.

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION

相關推薦