Slide 87
Slide 87 text
References (1/2)
[Precup+, 00] Doina Precup, Richard S. Sutton, and Satinder P. Singh. “Eligibility Traces for Off-Policy
Policy Evaluation.” ICML, 2000.
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs
[Strehl+, 10] Alex Strehl, John Langford, Sham Kakade, and Lihong Li. “Learning from Logged Implicit
Exploration Data.” NeurIPS, 2010. https://arxiv.org/abs/1003.0120
[Li+, 18] Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng
Wen. “Offline Evaluation of Ranking Policies with Click Models.” KDD, 2018.
https://arxiv.org/abs/1804.10488
[McInerney+, 20] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Ben
Carterette. “Counterfactual Evaluation of Slate Recommendations with Sequential Reward
Interactions.” KDD, 2020. https://arxiv.org/abs/2007.12986
[Dudík+, 14] Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. “Doubly Robust Policy
Evaluation and Optimization.” ICML, 2011. https://arxiv.org/abs/1503.02834
[Jiang&Li, 16] Nan Jiang and Lihong Li. “Doubly Robust Off-policy Value Evaluation for Reinforcement
Learning.” ICML, 2016. https://arxiv.org/abs/1511.03722
July 2022 Cascade Doubly Robust Off-Policy Evaluation @ CFML勉強会 87