Hierarchical Average Reward Reinforcement Learning
MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
Hierarchical reinforcement learning HRL is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representational schemes based on SMDP models have been studied in previous work, all of which are based on the discrete-time discounted SMDP model. In this approach, policies are learned that maximize the long-term discounted sum of rewards. In this paper we investigate two formulations of HRL based on the average-reward SMDP model, both for discrete time and continuous time. In the average-reward model, policies are sought that maximize the expected reward per step. The two formulations correspond to two different notions of optimality that have been explored in previous work on HRL hierarchical optimality, which corresponds to the set of optimal policies in the space defined by a task hierarchy, and a weaker local model called recursive optimality.
- Numerical Mathematics