# The multi-armed bandit problem with covariates

@article{Perchet2011TheMB, title={The multi-armed bandit problem with covariates}, author={Vianney Perchet and Philippe Rigollet}, journal={ArXiv}, year={2011}, volume={abs/1110.6084} }

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a… Expand

#### Topics from this paper

#### Paper Mentions

#### 138 Citations

Asymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems

- Computer Science
- ArXiv
- 2017

This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some… Expand

A non-parametric solution to the multi-armed bandit problem with covariates

- Mathematics
- 2021

Abstract In recent years, the multi-armed bandit problem regains popularity especially for the case with covariates since it has new applications in customized services such as personalized medicine.… Expand

Stochastic Multi-Armed Bandits with Control Variates

- Computer Science, Mathematics
- ArXiv
- 2021

This paper studies a new variant of the stochastic multi-armed bandits problem, where the learner has access to auxiliary information about the arms, and develops an algorithm named UCB-CV that uses improved estimates and characterize the regret bounds in terms of the correlation between the rewards and control variates. Expand

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

- Computer Science, Mathematics
- Found. Trends Mach. Learn.
- 2012

The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs. Expand

Be Greedy in Multi-Armed Bandits

- Computer Science
- ArXiv
- 2021

It is subversively claimed that for many interesting problems and associated horizons, the best compromise between theoretical guarantees, practical performances and computational burden is definitely to follow the greedy heuristic. Expand

Smoothness-Adaptive Stochastic Bandits

- Computer Science
- ArXiv
- 2019

This work develops a procedure that infers a global smoothness parameter of the payoff functions based on collected observations, and establishes that this procedure achieves rate-optimal performance up to logarithmic factors. Expand

Kernel Estimation and Model Combination in A Bandit Problem with Covariates

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2016

This work considers a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of allocation rules. Expand

Nonparametric Stochastic Contextual Bandits

- Computer Science, Mathematics
- AAAI
- 2018

The algorithm provides an extension to infinite-armed contextual bandits and recovers topological structures within the context space based on expected bandit performance and gives global intrinsic dimension dependent and ambient dimension independent regret bounds. Expand

Batched Multi-armed Bandits Problem

- Computer Science, Mathematics
- NeurIPS
- 2019

The BaSE (batched successive elimination) policy is proposed to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner. Expand

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

- Mathematics
- 2021

In this paper we initiate a study of non parametric contextual bandits under shape constraints on the mean reward function. Specifically, we study a setting where the context is one dimensional, and… Expand

#### References

SHOWING 1-10 OF 34 REFERENCES

Linear Response Two–Armed Bandits

- 2010

We consider a two–armed bandit problem which involves sequential sampling from two non-homogenous populations. The response in each is determined by a random covariate vector and a vector of… Expand

Nonparametric Bandits with Covariates

- Computer Science, Mathematics
- COLT
- 2010

This work derives general lower bounds on the performance of any admissible policy, and develops an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. Expand

RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES

- Mathematics
- 2002

We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the… Expand

A Note on Performance Limitations in Bandit Problems With Side Information

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2011

A minimax lower bound is derived that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate X, no policy can improve on the best achievable performance in the traditional bandit problem without side information. Expand

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2006

A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided. Expand

Finite-time Analysis of the Multiarmed Bandit Problem

- Computer Science
- Machine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

- Mathematics, Computer Science
- Period. Math. Hung.
- 2010

For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials. Expand

Bandit problems with side observations

- Mathematics, Computer Science
- IEEE Transactions on Automatic Control
- 2005

An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull and how much the additional information helps is quantified. Expand

Gap-free Bounds for Stochastic Multi-Armed Bandit

- Mathematics
- 2008

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror… Expand

Woodroofe's One-Armed Bandit Problem Revisited

- Mathematics
- 2009

We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799-806], which involves sequential sampling from two populations: one whose characteristics are known, and… Expand