The Markov Decision Process (MDP)

Learn about the Markov Chain and the Markov Decision Process in this guest post by Sudarshan Ravichandran, a data scientist and AI enthusiast, and the author of Hands-On Reinforcement Learning with Python.

A mathematical framework for solving reinforcement learning(RL) problems, the Markov Decision Process

(MDP) is widely used to solve various optimization problems. MDP provides a mathematical framework for solving RL problems, andalmost all RL problems can be modeled as MDP. This tutorial will take you through the nuances of MDP and its applications.

Before going into MDP, you must first understand the Markov chain and Markov process, which form the foundation of MDP.

The Markov property states that the future depends only on the present and not on the past. The Markov chain is a probabilistic model that solely depends on the current state to predict the next state and not the previous states.This means that the future is conditionally independent of the past. The Markov chain strictly follows the Markov property.

For example, if you know that the current state is cloudy, you can predict that the next state could be rainy. You came to the conclusion that the next state could be rainy only by considering the current state (cloudy) and not the past states, which might be sunny or windy.

However, the Markov property does not hold true for all processes. For example, throwing a dice (the next state) has no dependency on the previous number (the current state), whatsoever.

Moving from one state to another is called transition and its probability is called a transition probability. The transition probabilities can be formulated in the form of a table, as shown next, and it is called a Markov table. It shows, given the current state, what the probability of moving to the next state is:

Current state	Next state	Transition probability
Cloudy	Rainy	0.6
Rainy	Rainy	0.2
Sunny	Cloudy	0.1
Rainy	Sunny	0.1

You can also represent the Markov chain in the form a state diagram that shows the transition probability:

The preceding state diagram shows the probability of moving from one state to another. Still don't understand the Markov chain? Okay, let’s talk.

Me: "What are you doing?"

You: "I'm reading about the Markov chain."

Me: "What is your plan after reading?"

You: "I'm going to sleep."

Me: "Are you sure you're going to sleep?"

You: "Probably. I'll watch TV if I'm not sleepy."

Me: "Cool; this is also a Markov chain."

You: "Eh?"

The above conversation can be formulated into a Markov chain. The state diagram will be as follows:

The Markov Decision Process (MDP)

The Markov Decision Process (MDP)

從馬爾科夫決策過程到強化學習（From Markov Decision Process to Reinforcement Learning）

強化學習：Markov Decision Process (基於南大俞揚博士演講的修改和補充）

Udacity強化學習系列（二）—— 馬爾科夫決策過程（Markov Decision Processes）

Agents Part I: Markov Decision Processes

The Best Decision I Ever Made

Marginally Interesting: The Open Source Process and Research

Why canceling our ICO was the best decision we ever made

The Dirichlet Process the Chinese Restaurant Process and other representations

Ask HN: Is the current hiring process reflecting on new products?

The Android boot process from power on

報錯：failed to get the task for process XXX(解決方案)

The writing process

zbb20171017 svn Cleanup failed to process the following paths錯誤的解決

外顯子分析彈錯解決方案：Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed.

Can't attach to the process: ptrace(PTRACE_ATTACH, ..)

centos 報錯 “Job for iptables.service failed because the control process exited with error code.”的解決辦法

centos7啟動iptables時報Job for iptables.service failed because the control process exited with error cod

定向運動員識圖過程中視覺搜索特征研究 Study on Visual Search Characteristics of Orienteers in the Process of Map Reading

DeepTracker: Visualizing the Training Process of Convolutional Neural Networks（對卷積神經網絡訓練過程的可視化）

The Markov Decision Process (MDP)

相關推薦