CALIFORNIA UNIV BERKELEY OPERATIONS RESEARCH CENTER
Pagination or Media Count:
A special structure in dynamic programming is the problem of programming over a Markov chain. This paper extends the solution algorithms to programming over a Markov - renewal process - in which times between transitions of the system from state i to state j are independent samples from an inter-transition distribution which may depend on both i and j. For these processes, a general reward structure and a decision mechanism are postulated the problem is to make decisions at each transition to maximize the total expected reward at the end of the planning horizon. For finite-horizon problems, or infinite-horizon problems with discounting, there is no difficulty the results are similar to previous work, expect for a new dependency upon the transition time distributions being generally present. In the cases where the horizon extends towards infinity, or when discounting vanishes, however, a fundamental dichotomy in the optimal solutions may occur. It then becomes important to specify whether the limiting experiment is i undiscounted, with the number of transitions n approaches infinity , ii undiscounted, with a time horizon t approaches infinity , or iii infinite n or t , with discount factor a approaches 0 . In each case, a limiting form for the total expected reward is shown, and an algorithm developed to maximize the rate of return. The problem of finding the optimal or near-optimal policies In the case of ties in rate of return is still computationally unresolved. Extensions to non-ergodic processes are indicated, and special results for the two-state process are presented. Finally, an example of machine maintenance and repair is used to illustrate the generality of the approach and the special problems which may arise.
- Operations Research