ALTERNATIVE SCHEMES FOR INVESTIGATING MARKOV DECISION PROCESSES.
MASSACHUSETTS INST OF TECH CAMBRIDGE OPERATIONS RESEARCH CENTER
Pagination or Media Count:
The report is concerned with the problem of finding optimal policies for infinite-horizon Markovian decision models. Several algorithms applicable to discrete time, discrete state, single-chain Markov processes with undiscounted rewards are described. One of these algorithms, called the iterative method with limits, is proved and its properties are discussed. The method consists of finding upper and lower limits for the optimal gain per transition of the infinite-horizon process. These limits improve on each iteration of the algorithm, and converge to the final value of the optimal gain, ultimately. Simultaneously, the algorithm produces the relative value variables, as well as the optimal policy. The computational efficiency of the aforementioned algorithms was compared by solving several test problems. It was found that for problems for which the number of alternatives per state is considerably less than the number of states, the iterative method with limits enjoys a serious speed advantage over the well-known policy iteration algorithms. The former method also produces very accurate results with very little programming effort. Author
- Statistics and Probability
- Operations Research