DISCRETE DYNAMIC PROGRAMMING WITH SENSITIVE DISCOUNT OPTIMALITY CRITERIA.
STANFORD UNIV CALIF DEPT OF OPERATIONS RESEARCH
Pagination or Media Count:
Stationary, discrete and continuous time parameter, finite state and action Markovian decision processes with substochastic transition matrices are studied where the future rewards are discounted. The methods of successive approximations and policy improvement are shown to converge to the maximal expected discounted reward when the interst rate is negative provided the expected discounted time until the process stops is finite under every stationary policy. A family of discount optimality criteria of varying degrees of sensitivity is introduced. The family includes and unifies the discount criteria studied by Blackwell. The existence of stationary optimal policies is established constructively and the class of such policies is characterized. The algorithms are essentially specializations and refinements of a policy improvement method recently devised by Miller and the author for finding stationary policies having maximal expected discounted reward for all small enough positive interest rates. The results exploit the fact that the Laurent expansion in rho the interest rate about the origin of the discounted return under a stationary policy has a simple form. Author
- Administration and Management
- Operations Research