強化學習 相關資源
阿新 • • 發佈:2017-05-06
ren info round count question posit pre tar tor
最近因為某個不可描述的原因需要迅速用強化學習完成一個小實例,但是之前完全不懂強化學習啊,雖然用了人家的代碼但是在找代碼的過程中還是發現了很多不錯的強化學習資源,決定mark下來以後學習用
【1】如何用簡單例子講解 Q - learning 的具體過程?
https://www.zhihu.com/question/26408259
【2】最簡單的講解Q-Learning過程的例子
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
註:這個網站上還附帶了代碼,可惜都是用C++,java寫的,看不懂,哎,感覺是一個不錯的資源網站
這篇博客是對應的中文翻譯最簡單的講解Q-Learning過程的例子
還有人用python按照上述教程完成了復現:
https://github.com/JasonQSY/ML-Weekly/blob/master/P5-Reinforcement-Learning/Q-learning/Q-Learning-Get-Started.ipynb
具體代碼如下:
import numpy as np
import random
In [44]:
# initial
q = np.zeros([6, 6])
q = np.matrix(q)
r = np.array([[-1, -1, -1, -1, 0, -1], [-1, -1, -1, 0, -1, 100], [-1, -1, -1, 0, -1, -1], [-1, 0, 0, -1, 0, -1], [0, -1, -1, 0, -1, 100], [-1, 0, -1, -1, 0, 100]])
r = np.matrix(r)
gamma = 0.8
In [45]:
# training
for i in range(100):
# one episode
state = random.randint(0, 5)
while (state != 5):
# choose positive r-value action randomly
r_pos_action = []
for action in range(6):
if r[state, action] >= 0:
r_pos_action.append(action)
next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)]
q[state, next_state] = r[state, next_state] + gamma * q[next_state].max()
state = next_state
In [46]:
# verify
for i in range(10):
# one episode
print("episode: " + str(i + 1))
# random initial state
state = random.randint(0, 5)
print("the robot borns in " + str(state) + ".")
count = 0
while (state != 5):
# prevent endless loop
if count > 20:
print(‘fails‘)
break
# choose maximal q-value action randomly
q_max = -100
for action in range(6):
if q[state, action] > q_max:
q_max = q[state, action]
q_max_action = []
for action in range(6):
if q[state, action] == q_max:
q_max_action.append(action)
next_state = q_max_action[random.randint(0, len(q_max_action) - 1)]
print("the robot goes to " + str(next_state) + ‘.‘)
state = next_state
count = count + 1
【3】這個人的博客有強化學習系列
http://www.algorithmdog.com/ml/rl-series
【4】http://blog.csdn.net/u012192662/article/category/6394979
粗看感覺寫的還可以
強化學習 相關資源