Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine. “Benchmarks for Deep Off-Policy Evaluation.” ICLR, 2021. https://arxiv.org/abs/2103.16596 [Doroudi+,18] Shayan Doroudi, Philip S. Thomas, Emma Brunskill. “Importance Sampling for Fair Policy Selection.” IJCAI, 2018. https://people.cs.umass.edu/~pthomas/papers/Daroudi2017.pdf [Kiyohara+,23] Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito. “SCOPE-RL: A Python Library for Offline Reinforcement Learning, Off-Policy Evaluation, and Policy Selection.” 2023. [Hasselt+,16] Hado van Hasselt, Arthur Guez, and David Silver. “Deep Reinforcement Learning with Double Q-learning.” AAAI, 2016. https://arxiv.org/abs/1509.06461 May 2024 Towards assessing risk-return tradeoff of OPE 67