Slide 56
Slide 56 text
References (1/4)
[Beygelzimer&Langford,00] Alina Beygelzimer and John Langford. “The Offset Tree
for Learning with Partial Labels.” KDD, 2009.
[Precup+,00] Doina Precup, Richard S. Sutton, and Satinder Singh. “Eligibility Traces
for Off-Policy Policy Evaluation.” ICML, 2000.
https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_facult
y_pubs
[Strehl+,10] Alex Strehl, John Langford, Sham Kakade, and Lihong Li. “Learning from
Logged Implicit Exploration Data.” NeurIPS, 2010. https://arxiv.org/abs/1003.0120
[Dudík+,14] Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. “Doubly
Robust Policy Evaluation and Optimization.” ICML, 2011.
https://arxiv.org/abs/1503.02834
September 2022 Policy Adaptive Estimator Selection (PAS-IF) 56