《Learning to Incentivize Other Learning Agents》2020-NIPS

阿新 • • 發佈：2022-12-10

學會激勵其他學習智慧體

總結：

為了促進在一般和馬爾可夫遊戲中的多智慧體之間的合作，為每個智慧體配置一個獎勵函式用來直接向其他智慧體提供獎勵，並明確解釋接受該獎勵者自身行為會受到的影響。該獎勵函式會根據提供的獎勵對其他智慧體的產生的影響以及後續其他智慧體對自己獲取的環境獎勵的影響來不斷學習。使用該機制來刺激合作達到較高的群體收益。
創新點：保持分散式訓練解決大規模問題的同時保證了合作高收益，獎勵函式自適應學習而非事先制定
代理人學習包括兩部分：
- 學習一個使其獲得的外在獎勵和激勵總量達到最優的策略(強化學習)
- 學習一個改變其他代理人行為從而使自身外在目標達到最優的激勵函式(外在目標獎勵的梯度上升方法)

環境：

理想化模型，即每個智慧體擁有其他智慧體的引數和梯度
Escape Room game：(N,M)共有N名玩家，至少有M名玩家合作拉桿才能開門離開密室，拉桿的玩家會獲得-1的獎勵，如果沒有達到M則所有玩家獲得-1獎勵；成功開啟門則沒有參與拉桿的玩家獲得+10獎勵。
重複囚徒困境
cleanup環境

具體實現：

獎勵函式

包含環境獎勵和其他代理給予的激勵獎勵，-i代表除了i以外的其他下標
價值函式

最終目標是最大化價值函式
更新引數

由上面價值函式學到的策略得到一條軌跡，用來更新策略網路的引數

之後得到新的策略網路，同時有一條新的軌跡用來更新激勵函式的引數

後面一項代表獎勵別人所付出的代價

由上面兩個過程不斷迭代至收斂

演算法虛擬碼

《Learning to Incentivize Other Learning Agents》2020-NIPS

學會激勵其他學習智慧體總結：為了促進在一般和馬爾可夫遊戲中的多智慧體之間的合作，為每個智慧體配置一個獎勵函式用來直接向其他智慧體提供獎勵，並明確解釋接受該獎勵者自身行為會受到的影響。該獎勵函式會根據

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data 摘要 Abstract貢獻 ContributionsModel Architecture(GP-based iterative learning)整個訓練過程分為兩個階段1. labeled

《AdaptSegNet：Learning to Adapt Structured Output Space for Semantic Segmentation》論文筆記

參考程式碼：AdaptSegNet 1. 概述導讀：這篇文章著力於解決模型未見過資料的適應性，一般來講模型對於與訓練集中資料類似的資料表現較好，但是對於未知場景的資料就表現較差了，這也是domain-adaptation需

論文筆記+模型實現TransNets: Learning to Transform for Recommendation

文章目錄摘要1. 介紹2. 提出的方法2.1 CNN處理文字 & 2.2 DeepCoNN模型2.3 DeepCoNN的一下侷限性2.4 TransNets模型2.5 TransNets模型的訓練2.6 設計決策和一些其他結構的選擇2.6.1 分步訓練 VS. 合併

Learning to Transfer Examples for Partial Domain Adaptation學習筆記

Learning to Transfer Examples for Partial Domain Adaptation學習筆記目錄Learning to Transfer Examples for Partial Domain Adaptation學習筆記tipAbstractIntroductionrelated workPartial Domain AdaptationE

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks Abstract 我們為卷積神經網路引入了一個基於顯著性的扭曲（distortion）層，這有助於改善給定任務的輸入資料的空間取樣。我們

Why Learning to Code is So Damn Hard

程式設計大致可以分為四個階段第一階段(hand-holding honeymoon)：手把手關懷的蜜月期。能力和信心同步增長。初學者充滿了樂趣，很有成就感，能找到豐富的學習資料。第二階段(cliff of confusion)：充滿

[論文理解] Bootstrap Your Own Latent A New Approach to Self-Supervised Learning

Bootstrap Your Own Latent A New Approach to Self-Supervised Learning Intro 文章提出一種不需要負樣本來做自監督學習的方法，提出交替更新假說解釋EMA方式更新target network防止collapse的原因，同時用梯度解釋

顯著性目標檢測之Learning to Promote Saliency Detectors

Learning to Promote Saliency Detectors 舊文重發 https://github.com/lartpang/Machine-Deep-Learning 縮寫標註:

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

發表時間：2019（NeurIPS 2019 Deep Reinforcement Learning Workshop）文章要點：這篇文章想說model based方法裡面通常model都是imperfect的，就是這個model一般只在區域性是準確的，放到全域性上看誤差會越來越

《Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests》論文筆記

1. 摘要　　儘管深度神經網路(DNNs)在視訊異常檢測(VAD)方面取得了很大的進展，但現有的解決方案通常存在兩個問題：

《Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events》論文筆記

0. 摘要　　視訊異常檢測(VAD)作為視訊內容解釋的重要課題，通過深度神經網路(DNN)取得了豐碩的進展。然而，現有的方法通常遵循重建或幀預測程式。他們主要存在兩大問題：

Learning to Sample

Abstract 處理大型的點雲是一項很有挑戰性的任務，因此，我們將點雲取樣到一個合適的size去更方便的處理。

LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity

LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity Authors: Yutao Zhang, Robert Chen, Jie Tang, Walter F. Stewart, Jimeng Sun

2017-Learning to reinforcement learn

Key 元學習系統（監督+從屬）擴充套件於RL設定 LSTM用強化學習演算法進行訓練，可以使agent獲得一定的學習適應能力

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing閱讀筆記

L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing 2021 CVPR　　L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing (thecvf.com)

【機器學習 Azure Machine Learning】Azure Machine Learning 訪問SQL Server 無法寫入問題 (使用微軟Python AML Core SDK）

問題情形使用Python SDK在連線到資料庫後，連線資料庫獲取資料成功，但是在Pandas中用 to_sql 反寫會資料庫時候報錯。錯誤資訊為：ProgrammingError: (\'42000\', \"[42000] [Microsoft][SQL Server Native Client

強化學習論文研讀（四）——Deep Reinforcement Learning with Double Q-Learning

技術標籤：論文研讀深度學習強化學習python演算法 double Q learning + DQN的合成演算法。

Meta Learning ＆ Few-shot Learning(元學習VS小樣本學習)

Meta Learning ＆ Few-shot Learning(元學習VS小樣本學習) 一、Meta Learning：元學習，learn to learn

【PyTorch Learning】Reduce the learning rate: Class torch.optim.lr_scheduler.ReduceLROnPlateau()

When the network\'s evaluation indicators have not improved, we can improve the network\' performance by reducing the learning rate. Class used as follow:

《Learning to Incentivize Other Learning Agents》2020-NIPS

學會激勵其他學習智慧體

總結：

環境：

具體實現：

相關推薦