Accession Number : ADA587838


Title :   Learning Representation and Control in Markov Decision Processes


Descriptive Note : Final rept. 1 Aug 2010-31 Jul 2013


Corporate Author : MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE


Personal Author(s) : Mahadevan, Sridhar


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a587838.pdf


Report Date : 21 Oct 2013


Pagination or Media Count : 34


Abstract : This research investigated algorithms for approximately solving Markov decision processes (MDPs), a widely used model of sequential decision making. Much past work on solving MDPs in adaptive dynamic programming and reinforcement learning has assumed representations, such as basis functions, are provided by a human expert. The research investigated a variety of approaches to automatic basis construction, including reward-sensitive and reward-invariant methods, diagonalization and dilation methods, as well as orthogonal and over-complete representations. A unifying perspective on the various basis construction methods emerges from showing they result from different power series expansions of value functions, including the Neumann series expansion, the Laurent series expansion, and the Schultz expansion. The research also develops new computational algorithms for learning sparse solutions to MDPs using convex optimization methods.


Descriptors :   *ALGORITHMS , *DECISION MAKING , *MARKOV PROCESSES , *PROBLEM SOLVING , AIR FORCE RESEARCH , CONVEX BODIES , DYNAMIC PROGRAMMING , LEARNING , METHODOLOGY , OPTIMIZATION , SERIES(MATHEMATICS)


Subject Categories : Administration and Management
      Statistics and Probability


Distribution Statement : APPROVED FOR PUBLIC RELEASE