Accession Number:

ADA025077

Title:

Partially Observable Markov Decision Processes over an Infinite Planning Horizon with Discounting

Descriptive Note:

Technical rept. 1 Jan-31 Mar 1976

Corporate Author:

UNIVERSITY OF SOUTHERN CALIFORNIA LOS ANGELES BEHAVIORAL TECHNOLOGY LABS

Personal Author(s):

Report Date:

1976-03-01

Pagination or Media Count:

35.0

Abstract:

This is the last in a series of technical reports concerned with mathematical approaches to instructional sequence optimization in instructional systems. This paper deals with Markov decision processes where the true state of the system is not known with certainty. Hence the state of the system is characterized by a probability vector. Each action yields an expected reward, transforms the system to a new state and yields an observable outcome. One wishes to determine an action for each probability state vector so as to maximize the total expected reward. This report treats the infinite time horizon with a discount factor, using a partial N dimensional Maclaurin series to approximate the total optimal reward as a function of the probability state vector.

Subject Categories:

  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE