DQN training doesn't rely on expected outputs

Usually, when training a neural network, we compare the neural network output with the expected outputs. (This could be a note:newnote).

In DQN, there’s no knowledge of what is the correct output.

DQN relies on comparing the neural network output with a benchmark value.

To do it there is a “target neural network”, which is an older iteration of the neural network that is being trained. Periodically, after a number of iterations, the target neural network is updated.

The benchmark value is generated by adding the current reward signal in the replay buffer to the target network estimation of Q-value for the next state.

The difference between the current network and the benchmark value is the loss function used in DQN.

mlai cs

Gustavo's webpages

Graph View

Backlinks

Explorer

Explorer

DQN training doesn't rely on expected outputs

Graph View

Backlinks