A Markov Decison Process (MDP) is a markov process that is partly influenced by decision and partly by stochasticity.
A MDP has:
- : set of states; the state space
- : set of actions; the action space
- : probability a transition happens from to due to action
- : immediate reward/payoff/value due to state transition