Accession Number:

ADA276517

Title:

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Descriptive Note:

Memorandum rept.

Corporate Author:

MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB

Report Date:

1993-08-06

Pagination or Media Count:

16.0

Abstract:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD lambda algorithm of Sutton 1988 and the Q-learning algorithm of Watkins 1989, can be motivated heuristically as approximations to dynamic programming DP. In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TDlambda and Q-learning belong. reinforcement learning, Stochastic approximation, Convergence, Dynamic programming.

Subject Categories:

  • Operations Research

Distribution Statement:

APPROVED FOR PUBLIC RELEASE