Slide 43
Slide 43 text
参考文献
• 「バンディット問題の理論とアルゴリズム」本多・中村,2014年
• Bubeck, S., Munos, R., and Stoltz, G. (2009), “Pure Exploration in Multi-armed Bandits Problems,” in Algorithmic
Learning Theory, Springer Berlin Heidelberg, pp. 23–37.
— (2011), “Pure exploration in finitely-armed and continuous-armed bandits,” Theoretical Computer Science
• Gabillon, Ghavamzadeh, and Lazaric, Best arm identification: a unified approach to fixed budget and fixed confidence,
NeuriPS2012.
• Kaufmann, Cappe, and Garivier, On the complexity of best-arm identification in multi-armed bandit models, JMLR2016.
• Garivier and Kaufmann, Optimal best arm identification with fixed confidence, COLT2016.
• Glynn, P. and Juneja, S. (2004), “A large deviations perspective on ordinal optimization,” in Proceedings of the 2004 Winter
Simulation Conference, IEEE, vol. 1.
• Kasy, M. and Sautmann, A. (2021), “Adaptive Treatment Assignment in Experiments for Policy Choice,” Econometrica, 89,
113–132.