PERTURBATION THEORY AND MARKOVIAN DECISION PROCESSES.
MASSACHUSETTS INST OF TECH CAMBRIDGE OPERATIONS RESEARCH CENTER
Pagination or Media Count:
The Howard-Jewell algorithm for programming over a Markov-renewal process is analyzed in terms of a perturbation theory formalism which describes how the stationary distribution changes when the transition probabilities change. The policy improvement technique is derived from this new viewpoint. The relative values may be interpreted as partial derivatives of the gain rate with respect to policy. The value equations are shown to be solvable, with the relative values unique up to one additive constant, if and only if the underlying Markov chain is irreducible. The policy iteration algorithm is shown not to cycle, this guaranteeing convergence. A discussion of the existence, uniqueness, and characterization of the solution to the functional equation of dynamic programming is given. Emphasis is placed upon the value maximization of transient states. The fundamental matrix is developed as a useful tool for doing perturbation theory, describing firstpassage properties of semi-Markov processes, and for dealing with semi-Markov processes with rewards. Author