Accession Number:

ADA456807

Title:

Exploration and Policy Reuse

Descriptive Note:

Research paper

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE

Personal Author(s):

Report Date:

2005-07-01

Pagination or Media Count:

16.0

Abstract:

The authors define Policy Reuse as a learning technique that is guided by past policies and that offers the challenge of balancing three choices exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work, they introduce a new exploration strategy, pi-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. The authors therefore define a pi-reuse-based similarity metric between policies. They introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. They then show empirical results that demonstrate the usefulness of their exploration strategy, pi-reuse, as an intelligent bias to reuse past policies, and its effectiveness in defining the similarity between policies.

Subject Categories:

  • Statistics and Probability
  • Computer Programming and Software
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE