Tree for Learning with Partial Labels.โ KDD, 2009. [Precup+,00] Doina Precup, Richard S. Sutton, and Satinder Singh. โEligibility Traces for Off-Policy Policy Evaluation.โ ICML, 2000. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_facult y_pubs [Strehl+,10] Alex Strehl, John Langford, Sham Kakade, and Lihong Li. โLearning from Logged Implicit Exploration Data.โ NeurIPS, 2010. https://arxiv.org/abs/1003.0120 [Dudรญk+,14] Miroslav Dudรญk, Dumitru Erhan, John Langford, and Lihong Li. โDoubly Robust Policy Evaluation and Optimization.โ ICML, 2011. https://arxiv.org/abs/1503.02834 February 2023 Policy Adaptive Estimator Selection @ AAAI2023 56