Markov Decision Processes

Markov Decision Processes

Markov Decision Processes
Markov Decision Processes

Markov Decision Processes is framework developed by a man aptly named, Markov. It follows the Markov assumption that The current state of the world, depends on only one state, the present, and the actions taken in that state.

A common misconception based on the above paragraph might be, that an MDP is a simplification of the world because as we know that, the actions in the past define our future, so how can only the present state, that we currently occupy, be sufficient?

Markov Decision Processes
Markov Decision Processes

Let us take an example, In cricket, one would assume that to get the total runs scored by a batsman, in an over, would require the independent individual runs scored in the six balls. However, if we frame the problem in a Markov required way, we find that after each ball the state of the batsman changes based on the action performed, in that state. Consider we are on the 6th ball, having played the first five, the present state, therefore, has been reached by a culmination of actions performed from the very first ball to the fifth, we can then say that the present state encapsulates, through attributes, all the runs scored in the previous states. So, the action taken during the sixth ball, takes you to a state where all six balls are accounted for, this way of thinking may then be scaled up to a complete inning, with the overs as different states, which can then be scaled be scaled up both the innings, thereby describing the complete match.

Markov Decision Processes are contextually difficult to define, but when done so, it elegantly facilitates optimal decision making and management.


Well, what do you do if you want to get rich quick? You may rob a bank, cheat at a casino, but you do understand that these actions will have consequences, and if caught, you might not be around to enjoy your money.
So, you want to plan in such a way that, not only maximises your present returns but also thinks about your future consequences.

the tuple defining a formal markov processes

Markov Decision Process is formally a tuple consisting of (state, action, transition_function, and reward). The state consists of all possible locations that are possible in our experiment or game, these values can be a lot depending on the number of players playing the game. The transition function should capture all possible states, and actions that can be taken in those states, in simpler terms, a state may take an action that transitions it to a different state, although it should be accounted that transitioning happens with a probability. So, each (state, action) pair has a probability to transition from one state to another based on the action taken. The action needs to encoded in the states so that every state has a predefined list of actions it can perform.
Example: Let there be two states defined, 1. The living room, and 2. The kitchen, the actions of the agent (in this case a man or woman) can be walking straight, backward, turning right or left. As we already can tell, walking from the living room towards the kitchen has a high probability that our agent reaches there, but now consider the scenario. Our agent’s state is that he/she is against the living room wall, the action chosen is walking straight ahead to get from the present state to the next state, that is the kitchen, though it wishes to transition from the present state to the next, we know that the probability of that happening is highly unlikely. Finally, we come to defining the reward, it is the positive or negative value associated with every (state, action) pair, simply stating, every action taken during a particular state has a reward attached to it, it is done so to make sure we move in the general direction of our desired target. Eating(action) when hungry(state) can have a positive reward attached to it, whereas eating when full, might have a negative reward, in order to discourage the agent to do so.


Defining the rewards properly is quintessential in defining an MDP, as the reward is the driving force behind how you come up with your optimal plan/policy. Based on the rewards you create, a plan that tells you exactly what action to take at any given state to maximize the end result, goal, or output.

Sometimes, the project may self-define some of the rewards, like while working on a construction project, and laying down bricks on time, self-evidently holds a positive reward.

optimal plan for player X

Markov Processes are able to come up with optimal plans because they take into account the rewards one gets from the action + the rewards one might accrue in the future.


Suppose you are coaching a football team, and they have a tough match coming up next week. But, you have a secret strategy with you, an optimal policy based on the Markov decision processes. The policy takes into account all the states that the players can be in (like with the ball at their feet, without the ball, etc), all the actions that each one can take (kick the ball, dribble), being in their present state, the transition probability of taking that action, which leads them to next possible state, that maximises their chances of winning, and the rewards for taking such action at the given state. Though I think anyone can venture a guess that moving from a state to another to save a goal has a pretty high reward. If we can somehow compute the assembled Markov problem then we are solving a puzzle that guides us through a maze and emerges victorious.


Project management down to its core is a series of decisions taken throughout the course of the project, in long and overly complex projects, not all information is available at any given moment to make optimal decisions. Considering the fact that at any given moment a project manager probably needs to make only one decision, the process can be elegantly solved if we were able to define a Markov process for it.

AI project manager is still far off

As of now, the biggest hurdle that there is, is the fact that defining an MDP for a real-world problem is extremely difficult, and almost impossible to compute, but as Artificial Intelligence matures, we might come across really smart digital project managers in the future, these “beings” will have access to all the necessary information related to the project, and through the solution of MDPs can generate optimal policies, until then Markov Processes are great for small problems and AI project managers are still some ways off.

About the Author: Piyush Daga is a data science fanatic, and a firm believer that predicting the future with certainty is a fluke, but the closest you can get to doing so is with the right questions on the correct data.

Back to

Post Author: Ruchi

Leave a Reply