ON THE N-ARMED BANDIT AND OTHER PROBLEMS INVOLVING SEQUENTIAL DECISIONS.
MASSACHUSETTS INST OF TECH CAMBRIDGE OPERATIONS RESEARCH CENTER
Pagination or Media Count:
The N-armed bandit problem can be defined as the problem faced by a decision maker who is allowed a given number of opportunities to sample from N populations which generate random variables whose distributions are not known with certainty. If he wants to maximize his expected payoff, what should his strategy be. The paper gives a general formulation which is later specialized to the case where the N random variables have binary distributions. A numerical solution is given and some properties of the optimal strategy are discussed for both finite and infinite processes finally some near optimal rules are proposed and evaluated. In a second part, the so called action-timing problem and some variants of it are studied. Models with known probability distributions are considered first and then this assumption is relaxed to allow for imperfectly known distributions with adaptive learning built into the policy. Author
- Operations Research