Usually, when training a neural network, we compare the neural network output with the expected outputs. (This could be a note:newnote).
In DQN, there’s no knowledge of what is the correct output.
DQN relies on comparing the neural network output with a benchmark value.
To do it there is a “target neural network”, which is an older iteration of the neural network that is being trained. Periodically, after a number of iterations, the target neural network is updated.
The benchmark value is generated by adding the current reward signal in the replay buffer to the target network estimation of Q-value for the next state.
The difference between the current network and the benchmark value is the loss function used in DQN.