Accession Number:

ADA456806

Title:

Probabilistic Reuse of Past Policies

Descriptive Note:

Research paper

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE

Personal Author(s):

Report Date:

2005-07-01

Pagination or Media Count:

15.0

Abstract:

A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is similar to the actual policy or not. In this report, the authors describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful it is to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies.

Subject Categories:

  • Statistics and Probability
  • Computer Programming and Software
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE