1. 程式人生 > >The Markov Decision Process (MDP)

The Markov Decision Process (MDP)

Learn about the Markov Chain and the Markov Decision Process in this guest post by Sudarshan Ravichandran, a data scientist and AI enthusiast, and the author of Hands-On Reinforcement Learning with Python.

A mathematical framework for solving reinforcement learning(RL) problems, the Markov Decision Process

 (MDP) is widely used to solve various optimization problems. MDP provides a mathematical framework for solving RL problems, andalmost all RL problems can be modeled as MDP. This tutorial will take you through the nuances of MDP and its applications.

Before going into MDP, you must first understand the Markov chain and Markov process, which form the foundation of MDP.

The Markov property states that the future depends only on the present and not on the past. The Markov chain is a probabilistic model that solely depends on the current state to predict the next state and not the previous states.This means that the future is conditionally independent of the past. The Markov chain strictly follows the Markov property. 

For example, if you know that the current state is cloudy, you can predict that the next state could be rainy. You came to the conclusion that the next state could be rainy only by considering the current state (cloudy) and not the past states, which might be sunny or windy.

However, the Markov property does not hold true for all processes. For example, throwing a dice (the next state) has no dependency on the previous number (the current state), whatsoever.

Moving from one state to another is called transition and its probability is called a transition probability. The transition probabilities can be formulated in the form of a table, as shown next, and it is called a Markov table. It shows, given the current state, what the probability of moving to the next state is:

Current state

Next state

Transition probability

Cloudy

Rainy

0.6

Rainy

Rainy

0.2

Sunny

Cloudy

0.1

Rainy 

Sunny

0.1

You can also represent the Markov chain in the form a state diagram that shows the transition probability:

  

The preceding state diagram shows the probability of moving from one state to another. Still don't understand the Markov chain? Okay, let’s talk.

Me: "What are you doing?"

You: "I'm reading about the Markov chain."

Me: "What is your plan after reading?"

You: "I'm going to sleep."

Me: "Are you sure you're going to sleep?"

You: "Probably. I'll watch TV if I'm not sleepy."

Me: "Cool; this is also a Markov chain."

You: "Eh?"

The above conversation can be formulated into a Markov chain. The state diagram will be as follows:

 

相關推薦

The Markov Decision Process (MDP)

Learn about the Markov Chain and the Markov Decision Process in this guest post by Sudarshan Ravichandran, a data scientist and AI enthusiast, and the a

從馬爾科夫決策過程到強化學習(From Markov Decision Process to Reinforcement Learning)

從馬爾科夫決策過程到強化學習(From Markov Decision Process to Reinforcement Learning) 作者:Bluemapleman([email protected]) Github:https://github.com/blu

強化學習:Markov Decision Process (基於南大俞揚博士演講的修改和補充)

              馬爾科夫決策過程(Markov Decision Process) 一、強化學習基本數學模型——馬爾科夫過程(Markov Process) 大家可能聽到了很多詞,包括MDP,Q-Learning 、還有很多演算法的名字,我在報告裡

Udacity強化學習系列(二)—— 馬爾科夫決策過程(Markov Decision Processes)

說到馬爾科夫Markov,大家可能都不陌生,陌生的[連結往裡走](https://baike.baidu.com/item/%E9%A9%AC%E5%B0%94%E5%8F%AF%E5%A4%AB%E8

Agents Part I: Markov Decision Processes

2. Markov Decision ProcessesA Markov Decision Processes (MDP) is a discrete time stochastic control process. MDP is the best approach we have so far to mod

The Best Decision I Ever Made

David had been right. The other products out there sucked — nobody had made a good antiperspirant for sweaty hands. And that first evening turned into a fe

Marginally Interesting: The Open Source Process and Research

Tweet I think there is more to be learned from the open source software

Why canceling our ICO was the best decision we ever made

Why canceling our ICO was the best decision we ever madeWe founded Coral Health to solve the massive data sharing problem in healthcare. We thought tokeniz

Ask HN: Is the current hiring process reflecting on new products?

There's been a huge drop in terms of software quality and user experience these last few years. It's getting worse by the day. iOS and Mac OS are a perfect

The Android boot process from power on

Since mobile platforms and embedded systems has some differences compared to Desktop systems in how they initially start up and boot this post will discus

報錯:failed to get the task for process XXX(解決方案)

引人:     iOS真機除錯程式,報如下錯誤資訊: failed to get the task for process XXX 原因: 證書問題,project和targets的證書都必

The writing process

focus ppr car nec dem all hat fun publish How to write and summit an academic paper in 18 weeks. Page 24 Writing is not easy or fun,but

zbb20171017 svn Cleanup failed to process the following paths錯誤的解決

pat from rom 下載 解決 src 搜索 目錄 fcm 在使用TortoiseSVN工具執行Cleanup操作時經常出現Cleanup failed to process the following paths的錯誤,具體如下圖: Cleanup failed

外顯子分析彈錯解決方案:Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed.

highlight exe line light logs mtools and cannot not in 出現這種問題說明bam/cram文件沒有進行index. Samtool能解決這個問題,以bam文件為例,輸入以下命令行即可解決問題: /path/to/y

Can't attach to the process: ptrace(PTRACE_ATTACH, ..)

信息 AC sse main PE fail eas 文檔 execute PC : ubuntu 16.04 jdk : 1.8.0_144 當我想查看線程堆棧信息的時候,出現了如下異常: wzy@wzy-ubuntu:~$ jstack -F 7566 ××××××××

centos 報錯 “Job for iptables.service failed because the control process exited with error code.”的解決辦法

cau ack res sta ble put use wal ror 原因:因為centos7默認的防火墻是firewalld防火墻,不是使用iptables,因此需要先關閉firewalld服務,或者幹脆使用默認的firewalld防火墻。 操作步驟: 關閉防火墻 1.

centos7啟動iptables時報Job for iptables.service failed because the control process exited with error cod

異常信息 bsp stop input emctl tro stat mct centos7 centos7啟動iptables時報Job for iptables.service failed because the control process exi

定向運動員識圖過程中視覺搜索特征研究 Study on Visual Search Characteristics of Orienteers in the Process of Map Reading

lec har tro 學院 收錄 專家 page 特點 sea 作  者:劉陽何勁鵬LIU YangHE Jin-peng(Xi‘an University of Technology,Xi‘an,710048,ChinaNortheast Normal Universi

DeepTracker: Visualizing the Training Process of Convolutional Neural Networks(對卷積神經網絡訓練過程的可視化)

training ces ini net mini 個人 src works con \ 裏面主要的兩個算法比較難以贅述,miniset主要就是求最小公共子集。(個人認為)DeepTracker: Visualizing the Train