Accession Number:



A Multi-Armed Bandit Approach to Following a Markov Chain

Descriptive Note:

Technical Report,30 Jun 2015,16 Jun 2017

Corporate Author:

Naval Postgraduate School Monterey United States

Personal Author(s):

Report Date:


Pagination or Media Count:



Across defense, homeland security, and law enforcement communities, leaders face the tension between making quick but also well informed decisions regarding time-dependent entities of interest. For example, consider a law enforcement organization searcherwith a sizable list of potential terrorists targets but far fewer observational assets sensors. The searchers goal being to follow the target, but resource constraints make continuous coverage impossible, resulting in intermittent observational attempts. We model target behaviour as a discrete time Markov chain with the state space being the targets set of possible locations, activities, or attributes. In this setting, we define following the target as the searcher, at any given time step, correctly identifying and then allocating the sensor to the state which has the highest probability of containing the target. In other words, in each time period the searchers objective is to decide where to send the sensor, attempting to observe the target in that time period, resulting in a hit or miss from which the searcher learns the targets true transition behaviour. We develop a Multi-Armed Bandit approach for efficiently following the target, where each state takes the place of an arm. Our search policy is five to ten times better than existing approaches.

Subject Categories:

  • Statistics and Probability

Distribution Statement: