Q-learning is the process of finding the best possible Q-function

To make an optimal action policy, we would need to know the complete state transition matrix and the rewards associated with each state transition. Without these informations, we can’t have a value function that tells us what is the value of a certain action.

As we’re dealing with incomplete information, we try to approximate the value function. This approximation is called the Q-function.

Q-learning solves the problem of not knowing the/having no:

state transition matrix
action policy

Q-learning is a type of Reinforcement Learning.

mlai cs

Gustavo's webpages

Graph View

Backlinks

Explorer

Explorer

Q-learning is the process of finding the best possible Q-function

Graph View

Backlinks