Learning Posted Prices in Bilateral Trade: Regret Guarantees Under Full and Bandit Feedback

En cours de chargement...
Vignette d'image

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Université d'Ottawa | University of Ottawa

Licence Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International

Résumé

In this thesis we study an economically motivated sequential decision problem in which a learner repeatedly chooses an action (e.g., a posted price) and observes structured feedback. We ask how the information revealed after each decision determines whether learning is possible and what regret rates are achievable. We cast the problem in the online-learning framework and analyze two feedback models. Under full-feedback, the learner can effectively evaluate alternative actions; we give an efficient algorithm with sublinear regret and matching lower bounds, yielding sharp minimax rates. Under bandit- feedback, we show that without additional regularity, sublinear regret is impossible. We then identify natural smoothness conditions on the instance under which bandit learning becomes feasible again and derive regret guarantees. Overall, our results cleanly separate learnable from non-learnable regimes and quantify how mild structure can bridge the gap between full-feedback and bandit learning.

Description

Mots-clés

Learning, Online, Price, Machine, Bound, Rate

Citation

Approbation

Évaluation

Complété par

Référencé par