for slate bandits with abstraction @ WWW2024 17 ① ② ③ ① 潜在表現から slate がどの程度判別できるか ② ⼆つの slates の間で期待報酬がどの程度異なるか ③ ⼆つの slates の間で元々の slate の重みがどの程度異なるか 潜在表現の持つ情報量が⼤きいほど 偏りは⼩さくなる
Li. “Learning from Logged Implicit Exploration Data.” NeurIPS, 2010. https://arxiv.org/abs/1003.0120 [Swamminathan+,17] Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni. “Off-policy evaluation for slate recommendation.” NeurIPS, 2017. https://arxiv.org/abs/1605.04812 [Beygelzimer&Langford,09] Alina Beygelzimer, John Langford. “The Offset Tree for Learning with Partial Labels.” KDD, 2009. https://arxiv.org/abs/0812.4044 [Saito&Joachims,22] Yuta Saito, Thorsten Joachims. “Off-Policy Evaluation for Large Action Spaces via Embeddings.” ICML, 2022. https://arxiv.org/abs/2202.06317 [Dudík+,14] Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. “Doubly Robust Policy Evaluation and Optimization.” ICML, 2011. https://arxiv.org/abs/1503.02834 April 2024 OPE for slate bandits with abstraction @ WWW2024 37