Repository logo

Learning Posted Prices in Bilateral Trade: Regret Guarantees Under Full and Bandit Feedback

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa | University of Ottawa

Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Abstract

In this thesis we study an economically motivated sequential decision problem in which a learner repeatedly chooses an action (e.g., a posted price) and observes structured feedback. We ask how the information revealed after each decision determines whether learning is possible and what regret rates are achievable. We cast the problem in the online-learning framework and analyze two feedback models. Under full-feedback, the learner can effectively evaluate alternative actions; we give an efficient algorithm with sublinear regret and matching lower bounds, yielding sharp minimax rates. Under bandit- feedback, we show that without additional regularity, sublinear regret is impossible. We then identify natural smoothness conditions on the instance under which bandit learning becomes feasible again and derive regret guarantees. Overall, our results cleanly separate learnable from non-learnable regimes and quantify how mild structure can bridge the gap between full-feedback and bandit learning.

Description

Keywords

Learning, Online, Price, Machine, Bound, Rate

Citation

Related Materials

Alternate Version