Python Game and Q Learning

阿新 • • 發佈：2018-11-16

import numpy as np
import random
from tkinter import *
import time

tk = Tk()
tk.title('Q-Learning')
tk.wm_attributes('-topmost',1)

canvas = Canvas(tk,width=400,height=400,bd=0,highlightthickness=0)
for i in range(4):
    canvas.create_line(i*100,0,i*100,400)
    canvas.create_line(0,i*100,400,i*100)

trap1 = canvas. 
create_rectangle(200,0,300,100,fill='khaki')
trap2= canvas.create_rectangle(100,100,200,200,fill='khaki')
trap3 = canvas.create_rectangle(200,100,300,200,fill='khaki')
trap4 = canvas.create_rectangle(100,200,200,300,fill='khaki')
canvas.pack()
tk.update()

agent = canvas.create_rectangle(0,0,100,100,fill = 'orchid')

gamma = 0.8
R = np. 
array([[0,1,0,1],
                      [0,-10,1,-10],
                      [0,-10,1,1],
                      [0,1,-10,0],
                      [1,1,0,-10],
                      [1,-10,1,-10],
                      [-10,1,-10,1],
                      [1,1,-10,0],
                      [1,1,0,-10],
                      [ 
-10,1,1,1],
                      [-10,1,-10,1],
                      [1,10,1,0],
                      [1,0,0,1],
                      [-10,0,1,1],
                      [1,0,1,10],
                      [1,0,1,0]])
Q = np.zeros((16, 4))
valid_action = np.array([[1, 3],
                          [1, 2,3],
                          [1,2,3],
                          [1, 2],
                          [0,1,3],
                          [0,1,2,3],
                          [0,1,2,3],
                          [0,1,2],
                          [0,1,3],
                          [0,1,2,3],
                          [0,1,2,3],
                          [0,1,2],
                          [0,3],
                          [0,2,3],
                          [0,2,3],
                          [0,2]])
transition_matrix = np.array([[-1,4,-1,1],
                              [-1, 5, 0, 2],
                              [-1, 6, 1 , 3],
                              [-1, 7, 2, -1],
                              [0,8,-1,5],
                              [1,9,4,6],
                              [2,10,5,7],
                              [3,11,6,-1],
                              [4,12,-1,9],
                              [5,13,8,10],
                              [6,14,9,11],
                              [7,15,10,-1],
                              [8,-1,-1,13],
                              [9,-1,12,14],
                              [10,-1,13,15],
                              [11,-1,14,-1]])



def start(s):
    row = s//4
    column =s%4
    canvas.coords(agent,column*100,row*100,(column+1)*100,(row+1)*100)
    tk.update()
    time.sleep(0.05)
def moves(a):
    if a==0:
            canvas.move(agent,0,-100)
    elif a ==1:
         canvas.move(agent,0,100)
    elif a == 2:
         canvas.move(agent,-100,0)
    else :
        canvas.move(agent,100,0)

    tk.update()
    time.sleep(0.01)
    
def QLearning():
    s = random.randint(0,15)
    start(s)
    while s != 15:
        a =  random.choice(valid_action[s])
        s1= transition_matrix[s][a]
        moves(a)
        Q[s,a] = R[s,a] + gamma*Q[s1].max()
        s = s1
for i in range(100):
    QLearning()
    
label = Label(tk,text='Training over!!!,start test.',bg='green',compound='center')
label.pack()
tk.update()
time.sleep(3)
def test( s ):
    print(s,end="")
    start(s)
    while s != 15:
        a = Q[s].argmax()     
        s = transition_matrix[s][a]
        moves(a)
        time.sleep(1)
        print("-> %d"%s,end="")
test(5)
tk.mainloop()

在這裡插入圖片描述

Python Game and Q Learning

import numpy as np import random from tkinter import * import time tk = Tk() tk.title('Q-Learning') tk.wm_attributes('-topmost',1) canvas = Ca

【強化學習】python 實現 q-learning 例二

問題情境一個2*2的迷宮，一個入口，一個出口，還有一個陷阱。如圖這是一個二維的問題，不過我們可以把這個降維，變為一維的問題。 0.相關引數 epsilon = 0.9 # 貪婪度 greedy alpha = 0.1 # 學習率 gamma = 0.8 #

【強化學習】python 實現 q-learning 例三

本文作者：hhh5460 本文地址：https://www.cnblogs.com/hhh5460/p/10139738.html 例一的程式碼是函式式編寫的，這裡用面向物件的方式重新擼了一遍。好處是，更便於理解環境(Env)、個體(Agent)之間的關係。有緣看到的朋友，自己慢慢體會吧。 0.效果

【強化學習】python 實現 q-learning 例四（例二改寫）

陷阱 data img 入口 turn pda state save isod 將例二改寫成面向對象模式，並加了環境！不過更新環境的過程中，用到了清屏命令，play()的時候，會有點問題。learn()的時候可以勉強看到:P 0.效果圖 1.完整代碼相對於例一，

【強化學習】python 實現 q-learning 迷宮通用模板

本文作者：hhh5460 本文地址：https://www.cnblogs.com/hhh5460/p/10145797.html 0.說明這裡提供了二維迷宮問題的一個比較通用的模板，拿到後需要修改的地方非常少。對於任意的二維迷宮的 class Agent，只需修改三個地方：MAZE_

11 Deep Learning With Python Libraries and Frameworks

TensorFlow is an open-source library for numerical computation in which it uses data flow graphs. The Google Brain Team researchers developed this with the

Quiet log noise with Python and machine learning

Continuous integration (CI) jobs can generate massive volumes of data. When a job fails, figuring out what went wrong can be a tedious process that involve

Deep Q-learning and Policy Gradients ( towards AGI ).

Ch:13: Deep Reinforcement learning — Deep Q-learning and Policy Gradients ( towards AGI ).One of the most exciting developments in AI is #DeepRL. Today we

Python基本常用包整理（data analysis and machine learning），附查詢包版本語句

python 資料分析模組（Numpy、Scipy、Scikit和Pandas等） python進行機器學習(tensorflow）一、基礎包 ①Numpy Python科學計算的基礎包 ②Pand

Reinforcement Learning Q-learning 算法學習-2

action 結果最小 clas gamma -1 文章距離 blog 在閱讀了Q-learning 算法學習-1文章之後。我分析了這個算法的本質。算法本質個人分析。 1.算法的初始狀態是隨機的，所以每個初始狀態都是隨機的，所以每個初始狀態出現的概率都一樣的。如果訓

Python 基礎 - Day 2 Learning Note - 列表&元組

索引 import col 獨立數據結構如果忽略 faye 個數 1. 列表 LIST 定義列表 >>> alist=[‘Lydia‘,‘Miranda‘,‘Tony‘,‘Cody‘,‘Lorries‘] >>> print

Python 基礎 - Day 2 Learning Note - Set 集合

基礎差集可變集合自動 lap 完全添加 key值 com 集合是一個無序的，不重復的數據組合，它的主要作用如下：去重，把一個列表變成集合，就自動去重了關系測試，測試兩組數據之前的交集、差集、並集等關系 SET的分為可變集合和不可變集合（frozon se

Python 基礎 - Day 2 Learning Note - Dictionary 字典

重復表達式 item learning 菜單 bond 打印 value [1] Dictionary的表達式：{KEY: VALUE} value 可以是string, list, or disctionary. 層層嵌套，e.g 多層菜單 Dictionar

Python學習手冊《Learning Python》

界面學習 pytho 關註 min 。。 learn nbsp 特定 Python有一個交互式的開發環境，因為Python 是解釋運行，大大節省了每次編譯的時間本書是兩卷本的合集中的第一本，主要關註核心是Python語言本身，而不是其特定的語言程序。 learn

Optimization and Machine Learning（優化與機器學習）

compute war limit label right whether computer itself phy 這是根據（ShanghaiTech University）王浩老師的授課所作的整理。需要的預備知識：數分、高代、統計、優化 machine learning

Neural Networks and Deep Learning學習筆記ch1 - 神經網絡

1.4 true ole 輸出使用 .org ptr easy isp 近期開始看一些深度學習的資料。想學習一下深度學習的基礎知識。找到了一個比較好的tutorial，Neural Networks and Deep Learning，認真看完了之後覺

Python 基礎 - Day 4 Learning Note - 模塊 - Json & Pickle

ges 進階 code 不同語言 pic hid xml文件程序數據類型轉換 Json和Pickle的區別在python的序列化的兩個模塊中，json模塊是用於字符串和python數據類型間進行轉換；另一個pickle模塊，是用於python特有的類型（所有數據類型和

Python 基礎 - Day 5 Learning Note - 模塊之標準庫：datetime (2)

基礎格式時間日期和時間 bin att class ext one 介紹 Datetime 模塊是time模塊的再次封裝，提供了更多的接口。主要是日期和時間的解析，格式化及運算。其他關於時間的模塊： time - basic calendar - basic

Python--Get and Post

err import imp pan style encode eth ret port #python3 get and post 簡單封裝from urllib import request, parseimport jsondef RequestMethod(meth

Q-learning簡明實例Java代碼實現

day [0 pub and out ons value java代碼 company 在《Q-learning簡明實例》中我們介紹了Q-learning算法的簡單例子，從中我們可以總結出Q-learning算法的基本思想本次選擇的經驗得分 = 本次選擇的反饋得分 + 本

Python Game and Q Learning

相關推薦