Convergence Behavior of Temporal Difference Learning.
WRIGHT LAB WRIGHT-PATTERSON AFB OH AVIONICS DIRECTORATE
Pagination or Media Count:
Temporal difference learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuels checker-player and Tesauros Backgammon program. Their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed.
- Operations Research