Using Strong Convergence to Accelerate Value Iteration.
CARNEGIE-MELLON UNIV PITTSBURGH PA MANAGEMENT SCIENCES RESEARCH GROUP
Pagination or Media Count:
Convergence of the relative value function total value function less total value of a base state and the optimal policy in length of the horizon for value iteration markov decision programming have been recently shown to be geometric with factor alpha beta. Here alpha is the discount factor, and beta or 1.0. The case beta 1.0 is termed strong convergence. It is suggested in this paper that bounds on the convergence rate be estimated computationally during the value iteration process, giving bounds directly on the extrapolated infinite horizon relative value function. Such an extrapolation has two purposes. First, large numbers of iterations in value iteration can be skipped by continuing computation directly with the estimated infinite horizon relative value function this is directly analogous to quadratic acceleration procedures in non-linear programming. Second, existing procedures for the elimination of non-optimal actions are greatly strengthened, since actions can be eliminated permanently once bounds on the infinite horizon relative value function become sufficiently tight.
- Operations Research