Initially Stationary epsilon-Optimal Policies in Continuous Time Markov Decision Chains.
STANFORD UNIV CALIF DEPT OF OPERATIONS RESEARCH
Pagination or Media Count:
The asymptotic behavior of continuous time parameter Markov decision chains is studied. It is shown that the maxiaml total expected t period reward, less t times the maximal long-run average return rate, converges as t approaches infinity for every initial state. This result is used to establish the existence of policies which are simultaneously epsilon-optimal for all process durations, and which are stationary except possibly for a final, finite segment. Further, the length of the final segment depends on epsilon, but not on t for large enough t, while the initial stationary part of the policy is independent of both epsilon and t. The decision rules comprising the initially stationary part of these policies, called preferred, are characterized. Finite algorithms for finding preferred decision rules are given under varying hypotheses on the underlying structure of the system, though the general case case remains unsolved. Author
- Operations Research