A peek into Stochastic Multi-Armed Bandits with Heavy Tails.

ACO Student Seminar
Friday, April 15, 2022 - 1:00pm for 1 hour (actually 50 minutes)
Shubhada Agrawal – Tata Institute of Fundamental Research, Mumbai – shubhada.agrawal@tifr.res.inhttps://sites.google.com/view/shubhada-agrawal/home
Abhishek Dhawan

Please Note: Link: https://gatech.zoom.us/j/91232113113?pwd=MDhteEdtcENuME9kdXJmcUY0eWlSUT09

In this talk, we will look into the two most widely studied settings of the stochastic multi-armed bandit problems - regret minimization and pure exploration. The algorithm is presented with a finite set of unknown distributions from which it can generate samples. In the regret-minimization setting, its aim is to sample sequentially so as to maximize the total average reward accumulated. In the pure exploration setting, we are interested in algorithms that identify the arm with the maximum mean in a small number of samples on an average while keeping the probability of false selection to at most a pre-specified and small value. Both of these problems are well studied in literature and tight lower bounds and optimal algorithms exist when the arm distributions are known to belong to simple classes of distributions such as single-parameter exponential family, distributions that have bounded support, etc. However, in practice, the distributions may not satisfy these assumptions and may even be heavy-tailed. In this talk, we will look at techniques and algorithms for optimally solving these two problems with minimal assumptions on the arm distributions. These ideas can be extended to a more general objective of identifying the distribution with the minimum linear combination of risk and reward, which captures the risk-reward trade-off that is popular in many practical settings, including in finance.