SARSA
The SARSA algorithm can be applied to model-free control problems and allows us to optimize the value function of an unknown MDP. SARSA is an on-policy temporal difference learning-based control algorithm. The SARSA algorithm can be summarized as follows: