Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

阿新 • • 發佈：2022-04-03

鄭重宣告：原文參見標題，如有侵權，請聯絡作者，將會撤銷釋出！

Neural Computation, (2007): 2245-2279

Abstract

　　學習智慧體，無論是自然的還是人工的，都必須更新它們的內部引數，以便隨著時間的推移改進它們的行為。在強化學習中，這種可塑性受到環境訊號(稱為獎勵)的影響，該訊號將變化引導到適當的方向。我們將最近引入的從機器學習中引入的策略學習演算法應用於脈衝神經網路，並推匯出一個脈衝時序依賴可塑性規則，以確保收斂到期望平均獎勵的區域性最優值。該方法適用於廣泛的神經元模型，包括Hodgkin-Huxley模型。我們證明了派生規則在幾個toy問題中的有效性。最後，通過統計分析，我們表明所建立的突觸可塑性規則與廣泛使用的BCM規則密切相關，具有良好的生物學證據。

1 Policy Learning and Neuronal Dynamics

2 Derivation of theWeight Update

2.1 Two Explicit Choices for α.

3 Extensions to General Neuronal Models

Algorithm 1: Synaptic Update Rule for a Generalized Neuronal Model

3.1 Explicit Calculation of the Update Rules for Different α Functions.

3.1.1 Demonstration for α(s) = qδ(s).

3.1.2 Demonstration for α(s) = .

3.2 Depressing Synapses.

4 Simulation Results

5 Relation to the BCM Rule

6 Discussion

Appendix A: Computing Expectations

A.1 Expectation with Respect to the Postsynaptic Spike Train.

A.2 Expectation with Respect to the Presynaptic Spike Train.

Appendix B: Simulation Details

Appendix C: Technical Derivations

C.1 Decaying Exponential α Function.

C.2 Depressing Synapses.

Case 1: No Presynaptic Spike Occurred Since the Last Postsynaptic Spike.

Case 2: At Least One Presynaptic Spike Occurred Since the Last Postsynaptic Spike.

C.3 MDPs, POMDPs

C.3.1 MDPs.

Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Online and Offline Reinforcement Learning by Planning with a Learned Model

Evaluating the Performance of Reinforcement Learning Algorithms

Decoupling Value and Policy for Generalization in Reinforcement Learning

Game Theory and Multi-agent Reinforcement Learning筆記上

ON THE ROLE OF PLANNING IN MODEL-BASED DEEP REINFORCEMENT LEARNING

TREEQN AND ATREEC: DIFFERENTIABLE TREE-STRUCTURED MODELS FOR DEEP REINFORCEMENT LEARNING

lec-1-Deep Reinforcement Learning, Decision Making, and Control

【刷題-LeetCode】122 Best Time to Buy and Sell Stock II

【刷題-LeetCode】123 Best Time to Buy and Sell Stock III

# codeforce 1350 F. Kuroni and the Punishment （思維+隨機化大法+shuffle洗牌）

CF1361E James and the Chase

121. Best Time to Buy and Sell Stock

122. Best Time to Buy and Sell Stock II

CppCon筆記--Back to Basics: RAII and the Rule of Zero

Codeforces-1384B2 Koa and the Beach (Hard Version)

CF1250B The Feast and the Bus(貪心+列舉)(來自洛谷)

CodeForces 1384B1. Koa and the Beach (Easy Version)

Codeforces Round #659 (Div. 2) B1. Koa and the Beach (Easy Version)

Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

相關推薦