Slide 104
Slide 104 text
参考⽂献 (2/9)
[Fu+,21 (DOPE)] Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu
Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral
Kumar, Cosmin Paduraru, Sergey Levine, and Tom Le Paine. “Benchmarks for Deep
Off-Policy Evaluation.” ICLR, 2021. https://arxiv.org/abs/2103.16596
[Voloshin+,21 (COBS)] Cameron Voloshin, Hoang M. Le, Nan Jiang, and Yisong Yue.
“Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning.” NeurIPS
dataset&benchmark, 2021. https://arxiv.org/abs/1911.06854
[Rohde+,18 (RecoGym)] David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile,
and Alexandros Karatzoglou “RecoGym: A Reinforcement Learning Environment for
the problem of Product Recommendation in Online Advertising.” 2018.
https://arxiv.org/abs/1808.00720
May 2024 SCOPE-RL package description 104