Risk-Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

阿新 • • 發佈：2021-11-06

發表時間：2021 （AAMAS 2021 extended abstract）
文章要點：這篇文章想說通常RL都是去最大化累計回報，這個值通常都是標量，標量反映出來的資訊肯定就沒有分佈多。這篇文章就在risk-aware and multi-objective的設定下用MCTS來做distributional這個事情（Distributional Monte Carlo Tree Search）。具體來說，這個時候的reward變成了向量形式，最後用效用函式（utility）轉成一個最終的標量。這裡有兩個指標，一個是scalarised expected returns (SER)

就是先求期望，然後再作用到utility上變成標量。另一個是expected scalarised returns (ESR)

就是先作用到utility上變成標量，再求期望。作者想說他這個DMCTS的方式對於兩者都適用，而且對於分線性的utility也適用。
方法上來看，MCTS沒有變，還是Selection，Expansion，Simulation，Backpropagation這幾個步驟，只是裡面的reward變成了維護一個向量，並且和圍棋不一樣的是樹裡面多了chance node。然後selection的時候沒用UCT，而是用Bootstrap Thompson Sampling，這個邏輯上和UCT其實差不多，思路就是根據之前的訪問資料，更新引數α,β從而更新後驗分佈，然後根據分佈去選使得ESR或者SER最大的動作。這裡的exploration主要就是由bootstrap完成，不同的bootstrap會產生不同的α,β，從而平衡exploitation和exploration。
總結：

很成功的應用，從結果上來看，比Q-learning based的RL演算法好不少。不過幾個實驗還是比較簡單的，不知道複雜一點的好不好用，特別是計算量上面。
疑問：risk-aware到底是啥？
兩個指標ESR和SER從真實含義上來看，有啥區別？
文章裡面一直提過去的回報和未來的回報

我感覺所有RL的演算法都會考慮這個吧，不知道這個地方強調的點在哪？
文章一直強調utility function是線性或者非線性的區別，這個在演算法層面有什麼影響？

Risk-Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

發表時間：2021 （AAMAS 2021 extended abstract）文章要點：這篇文章想說通常RL都是去最大化累計回報，這個值通常都是標量，標量反映出來的資訊肯定就沒有分佈多。這篇文章就在risk-aware and multi-objective的

Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search

發表時間：2019（AAAI2020）文章要點：一篇做NAS的文章，主要想法就是用MCTS來做NAS，相對random，Q-learning，Hill Climbing這些更能平衡探索與利用。主要方法是把NAS的問題定義好，比如動作是什麼，就是每次搭

lec-1-Deep Reinforcement Learning, Decision Making, and Control

What is RL 基於學習的決策的數學形式從經驗中學習決策和控制的方法 Why should we study this now

2 errors and 0 warnings potentially fixable with the `--fix` option，vue-cli3中eslint詳解

當我們建立vue專案的時候，我們往往會選擇linter/Formatter，eslint-config-standard,下面我放張vue圖形化配置介面但這往往是進坑的開始特別注意一下這裡的外掛： \"standard\"外掛代表的是eslint的standard外掛都要

【學習筆記】Multi-Objective Differential Evolution Algorithm --MODEA

【學習筆記】Multi-Objective Differential Evolution Algorithm --MODEA 正文（一）演算法關鍵點（二）.概念定義

Game Theory and Multi-agent Reinforcement Learning筆記上

一、引言多智慧體強化學習的標準模型：多智慧體產生動作a1，a2.....an聯合作用於環境，環境返回當前的狀態st和獎勵rt。智慧體接受到系統的反饋st和ri，根據反饋資訊選擇下一步的策略。

Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism（2019 TIP）

為什麼有這篇文章（motivation）？作者認為人臉不不同的region對FER（人臉表情識別）的貢獻是不同的，所以作者把人臉裁剪為patch的形式（利用關鍵點得到24個patch），計算每個patch對FER的貢獻度（利用attention

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 2022-03-16 21:02:21

Multi-Task Learning as Multi-Objective Optimization

目錄概主要內容 Pareto 最優 KKT 條件 MDGA-UB 特殊的雙重任務一般的多重任務更高效

執行react專案，npm run start/build, 報錯 There might be a problem with the project dependency tree. It is likely not a bug in Create React App, but something you need to fix locally.

如題：這個問題困擾了我半天，網上搜索各種解決方法，都沒能解決，最後仔細讀一遍原因才發現問題很簡單，就是版本不一致

每日一篇文獻：Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

標題：Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

V2016.7 MB SD C5 SD Connect Compact 5 Star Diagnosis with WIFI for Cars and Trucks Multi-Language

MB SD Connect Compact 5 Star Diagnosis with WIFI for Cars and Trucks Multi-Langauge New MB Compact 5 Features

FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation閱讀筆記

動機本文是2021年infocom上的一篇論文。聯邦學習面臨著兩大挑戰:1.使用者可能並不願意參與到學習中，因為該學習消耗計算資源和精力。2.每個使用者提供的更新質量不同，低質量的模型更新會破壞整體模型的效能。針對以

筆記：Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction

Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction 作者：Yu et al., CONLING 2020.

2018年BRATS 腫瘤分割挑戰賽第三名分割方案One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation

首先說一下我對這個方案的看法，相比第一名與第二名的方案，這個方案的分割方法確實複雜的多，原論文是發表在MICCAI，後來磚投到IEEE image processing（SCI 1區），總體感覺給人一種花裡胡哨的感覺，但是看分割結果

Logging with ElasticSearch, Kibana, ASP.NET Core and Docker

好久不見，前兩週經歷了人生第一次"偽牛市"，基金和股市大起大落，更加堅信“你永遠賺不到超出你認知範圍之外的錢,除非靠著運氣”，老韭菜誠不欺我也。

2020 Multi-University Training Contest 3 1005- Little W and Contest

連結 http://acm.hdu.edu.cn/showproblem.php?pid=6795 題意人分兩類，1類2類隊有三人，至少兩個2類

OMSPatcher命令報錯Fail with Error Code 1 and Message: Unsupported major.minor version 51.0

OMSPatcher命令報錯Fail with Error Code 1 and Message: Unsupported major.minor version 51.0 前言最近安裝OEM 13c，準備對OMS打補丁的時候，檢查OMSPatcher版本的時候，報錯如下：

Unhandled exception handling in iOS and Android with Xamarin.

Unhandled exceptions are hard to catch and log, and you must do it to be able to handle the errors in your app. One approach is to use Xamarin.Insights but you always want to be able to just log into

(vue操作storage)Vue plugin for work with local storage,session storage and memo

vue-ls https://www.npmjs.com/package/vue-ls NPM npm install vue-ls --save Yarn yarn add vue-ls Usage Vue storage API.

Risk-Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

相關推薦