Multi-Armed Bandits with Delayed and Aggregated Rewards
Technical Report,01 Nov 2018,28 Feb 2019
CCDC ARL Adelphi United States
Pagination or Media Count:
We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.
- Statistics and Probability