Action Policy is a function that returns an action given an state

Given a state s, the action policy returns an action a:

π (s) = a

An optimal policy returns the action that maximizes the reward/value that can be achieved in that state. In other words, an optimal policy chooses the highest value action, according to the Bellman Equation

Reward or value may come in the future, thus an optimal policy maximizes the reward/value over a sequence of states and actions. Another way to think about this is that the next action should unlock the maximum potential rewards in the future.

Resources:

See “Policy function” https://scholar.harvard.edu/files/basilico/files/laibson_notes_2013_0.pdf

cs econ

Gustavo's webpages

Graph View

Backlinks

Explorer

Explorer

Action Policy is a function that returns an action given an state

Graph View

Backlinks