Accession Number:

AD1078688

Title:

Multi-Armed Bandits with Delayed and Aggregated Rewards

Descriptive Note:

Technical Report,01 Nov 2018,28 Feb 2019

Corporate Author:

CCDC ARL Adelphi United States

Report Date:

2019-08-01

Pagination or Media Count:

20.0

Abstract:

We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.

Subject Categories:

  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE