Learning Posted Prices in Bilateral Trade: Regret Guarantees Under Full and Bandit Feedback
En cours de chargement...
Date
Authors
Nom de la revue
ISSN de la revue
Titre du volume
Éditeur
Université d'Ottawa | University of Ottawa
Résumé
In this thesis we study an economically motivated sequential decision problem in which a learner repeatedly chooses an action (e.g., a posted price) and observes structured feedback. We ask how the information revealed after each decision determines whether learning is possible and what regret rates are achievable. We cast the problem in the online-learning framework and analyze two feedback models. Under full-feedback, the learner can effectively evaluate alternative actions; we give an efficient algorithm with sublinear regret and matching lower bounds, yielding sharp minimax rates. Under bandit-
feedback, we show that without additional regularity, sublinear regret is impossible. We then identify natural smoothness conditions on the instance under which bandit learning becomes feasible again and derive regret guarantees. Overall, our results cleanly separate learnable from non-learnable regimes and quantify how mild structure can bridge the gap between full-feedback and bandit learning.
Description
Mots-clés
Learning, Online, Price, Machine, Bound, Rate

