Learning Posted Prices in Bilateral Trade: Regret Guarantees Under Full and Bandit Feedback
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa | University of Ottawa
Abstract
In this thesis we study an economically motivated sequential decision problem in which a learner repeatedly chooses an action (e.g., a posted price) and observes structured feedback. We ask how the information revealed after each decision determines whether learning is possible and what regret rates are achievable. We cast the problem in the online-learning framework and analyze two feedback models. Under full-feedback, the learner can effectively evaluate alternative actions; we give an efficient algorithm with sublinear regret and matching lower bounds, yielding sharp minimax rates. Under bandit-
feedback, we show that without additional regularity, sublinear regret is impossible. We then identify natural smoothness conditions on the instance under which bandit learning becomes feasible again and derive regret guarantees. Overall, our results cleanly separate learnable from non-learnable regimes and quantify how mild structure can bridge the gap between full-feedback and bandit learning.
Description
Keywords
Learning, Online, Price, Machine, Bound, Rate
