Exploration and Policy Reuse
CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
Pagination or Media Count:
The authors define Policy Reuse as a learning technique that is guided by past policies and that offers the challenge of balancing three choices exploitation of the ongoing learned policy, exploration of random actions, and exploration towards the past policies. In this work, they introduce a new exploration strategy, pi-reuse, as an intelligent bias to reuse a past policy when learning a new one. Interestingly, this strategy also provides a similarity metric among a set of past policies and the new one. The authors therefore define a pi-reuse-based similarity metric between policies. They introduce a new algorithm that combines the selection and reuse of past policies using this similarity metric. They then show empirical results that demonstrate the usefulness of their exploration strategy, pi-reuse, as an intelligent bias to reuse past policies, and its effectiveness in defining the similarity between policies.
- Statistics and Probability
- Computer Programming and Software