Accession Number : AD1046284


Title :   A Multi-Armed Bandit Approach to Following a Markov Chain


Descriptive Note : Technical Report,30 Jun 2015,16 Jun 2017


Corporate Author : Naval Postgraduate School Monterey United States


Personal Author(s) : Akin,Ezra W


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/1046284.pdf


Report Date : 01 Jun 2017


Pagination or Media Count : 75


Abstract : Across defense, homeland security, and law enforcement communities, leaders face the tension between making quick but also well informed decisions regarding time-dependent entities of interest. For example, consider a law enforcement organization (searcher)with a sizable list of potential terrorists (targets) but far fewer observational assets (sensors). The searchers goal being to follow the target, but resource constraints make continuous coverage impossible, resulting in intermittent observational attempts. We model target behaviour as a discrete time Markov chain with the state space being the targets set of possible locations, activities, or attributes. In this setting, we define following the target as the searcher, at any given time step, correctly identifying and then allocating the sensor to the state which has the highest probability of containing the target. In other words, in each time period the searchers objective is to decide where to send the sensor, attempting to observe the target in that time period, resulting in a hit or miss from which the searcher learns the targets true transition behaviour. We develop a Multi-Armed Bandit approach for efficiently following the target, where each state takes the place of an arm. Our search policy is five to ten times better than existing approaches.


Descriptors :   probability , machine learning , MARKOV CHAINS , STOCHASTIC PROCESSES


Subject Categories : Statistics and Probability


Distribution Statement : APPROVED FOR PUBLIC RELEASE