The Shift-Function Approach for Markov Decision Processes with Unbounded Returns.
STANFORD UNIV CA DEPT OF OPERATIONS RESEARCH
Pagination or Media Count:
We study a discrete-time Markov decision process with general state and action space. The objective is to maximize the expected total return over a finite or infinite horizon. The transition probability measure is allowed to be defective, so that the model includes discounting, state-and action-dependent transition times semi-Markov decision processes, and stopping problems. With applications to control of queues and inventory systems as a motivation, we develop a set of conditions on the one-period return function, the transition probabilities and the terminal value function that guarantee uniform convergence with respect to the sup norm of the finite-horizon optimal value functions to the infinite-horizon optimal value function successive approximations. These conditions are substantially weaker and more realistic for the applications we have in mind than those of the classical, discounted bounded model. Author
- Statistics and Probability