Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML

Counterfactual Machine Learningೖ໳ גࣜձࣾαΠόʔΤʔδΣϯτ ΞυςΫຊ෦ɹAI Lab Kazuki Taniguchi

Introduction • ৬छ  ɹResearch Scientist • ݚڀྖҬ • Basics of
Machine Learning • Response Prediction • Counterfactual ML • ͜Ε·Ͱͷ࢓ࣄ (ResearchҎ֎) • MLaaSͷ։ൃ • DSPͷΞϧΰϦζϜ։ൃ

What is Counterfactual ML? ※Counterfactual ML (Machine Learning) [1]

Counterfactual ML • ʮ൓ࣄ࣮͕ੜ͡Δσʔλʹର͢ΔΞϧΰϦζϜͷධՁɺ ͋Δ͍͸ϞσϧΛֶश͢ΔΞϧΰϦζϜʯͱఆٛ͢Δ  (ݫີͳఆٛͰ͋Δอূ͸ͳ͍) • ൓ࣄ࣮͕ੜ͡ΔσʔλΛѻ͏ΞϧΰϦζϜ • Interactive
Learning ← (ࠓճ͸͜ͷྫͰઆ໌͢Δ) • (Contextual) Bandit Algorithm • Reinforcement Learning • Covariant Shift

Supervised Learning Feature(Context): xi 1 9 8 7 3 2
Predictions: ̂ yi ̂ yi = f(xi ) 1 9 5 7 3 2 Labels: yi miss correct correct correct correct correct

• Ϣʔβʹ޿ࠂը૾Λදࣔͤ͞Δ • Ϣʔβͱ഑৴໘͸Context(೔࣌, ੑผ, τϐοΫ, etc…)Λ͍࣋ͬͯΔ • ޿ࠂը૾͸ީิͷத͔ΒҰͭͷը૾͚ͩදࣔ͞ΕΔ •
දࣔ͞ΕΔ޿ࠂ͕ΫϦοΫ͞ΕΔΑ͏ʹ഑৴͍ͨ͠ Problem Setting Ad Selection ഑৴໘ Ϣʔβ π(x)

Interactive Learning [2] Feature(Context): xi ai = π(xi ) Action:
ai Reward: ri ഑৴໘ Ϣʔβ Click or Not Ϣʔβ

Comparison with Supervised Learning 1 7 Labels Supervised Learning Interactive
Learning click Counterfactual • બ୒͞Εͳ͔ͬͨΞΫγϣϯͷධՁ͸൓ࣄ࣮ͱͳΔ • ৽͍͠PolicyΛධՁ͢Δࡍ͸൓ࣄ࣮ͷΞΫγϣϯΛධՁͰ͖ͳ͍

Comparison with Contextual Bandit • ໰୊ઃఆ͸ಉ͡ • Counterfactual ML͸Ofﬂine (Batch)
LearningΛϝΠϯʹऔΓѻ͏ • Onlineͱҧ͍ɺධՁ͕ߦ͍΍͍͢఺͕ϝϦοτͱͳΔ • Contextual Bandit͸OnlineͰPolicyΛߋ৽͢Δ • Counterfactual MLͷߟ͑ํ͸Contextual BanditͷPolicyͷ  ධՁΛ͢Δ͜ͱͱಉ͡ (Ofﬂine Evaluation) [3] Evaluationʹ͍ͭͯ͸ʮAI Lab Research Blogʯͷهࣄ[3]ʹ ৄ͘͠ॻ͔Ε͍ͯΔͷͰࠓճͷൃදͰ͸ׂѪ͠·͢

Algorithms

Deﬁnitions • Data • Policy D = ((x1 , y1
, δ1 , p1 ), . . . , (xn , yn , δn , pn )) xi yi δi pi yi = π(xi ) π : Context : Labels (multi-label settings) : Reward : Propensity Score (ޙड़) : Policy (Context → Action)

Counterfactual Risk Minimization • Unbiased Estimation R(π) = 1 n
n ∑ i=1 δi π(yi |xi ) π0 (yi |xi ) = 1 n n ∑ i=1 δi π(yi |xi ) pi δi π0 : loss : logging policy (→ Propensity Score) Importance sampling R(π) = 1 n n ∑ i=1 min{M, δi π(yi |xi ) pi } clipping (M)Λಋೖͨ͠Լه͕IPS (Inverse Propensity Score) Estimator [4]

Counterfactual Risk Minimization arg min h R(h) + λ Varw
(u) n • CRM (Counterfactual Risk Minimization) Generalization Error Boundsͷ্ݶΛ࠷খʹ͢Δ ※ৄࡉ͸࿦จΛࢀর data-dependent regularizer

• classiﬁcationͱಉ༷ͷpolicy (ઢܗ + softmax) • ҎԼͷࣜͷ௨Γʹֶश POEM [5] π(y|x)
= exp(wϕ(x, y)) ∑ y′∈Y exp(wϕ(x, y′) w * = arg min w∈Rd ¯ uw + λ Varw (u) n ui w ≡ δi min{M, exp(wϕ(x, y)) pi ∑ y′∈Y exp(wϕ(x, y′) } ¯ uw ≡ n ∑ i=1 ui w Varw (u) ≡ 1 n − 1 n ∑ i=0 (ui w − ¯ uw )2

Experiments • Dataset (multi label experiments) • Supervised to Bandit
Conversion [6] 5% 95% x y* CRF π0 y ᶃશσʔλͷ5%Ͱlogging policyΛֶश ᶄಘΒΕͨlogging policyͰ95%ͷσʔλʹϥϕϧΛ෇༩ ᶅ feedbackΛyͱy*Ͱܭࢉ  (Hamming loss) δ

Experimental Results [5] • Test set Hamming Loss • Computational
time (seconds) S: AdaGrad, B: L-BFGS

Note • logʹଘࡏ͠ͳ͍ϥϕϧʹରͯ͠͸ਖ਼֬ͳ༧ଌ͸Ͱ͖ͳ͍ ex) ৽͍͠޿ࠂΛ௥Ճ͢Δ࣌ log A B C B
A OK NG Counterfactual ML ※্هͷྫ͸ۃ୺ͳྫɺՄೳʹ͢Δํ๏΋ଘࡏ͢Δ

More • [5]ͷݚڀνʔϜ͕ܧଓతʹݚڀΛൃද • ”The Self-Normalized Estimator for Counterfactual Learning”
• “Recommendations as Treatments: Debiasing Learning and Evaluation” • “Unbiased Learning-to-Rank with Biased Feedback” • “Deep Learning with Logged Bandit Feedback” • Microsoft Researchʹ΋ଟ͘ͷݚڀऀ͕ࡏ੶த ڵຯͷ͋Δํ͸ͥͻௐ΂ͯΈ͍ͯͩ͘͞ʂ

Summary

Summary • Counterfactual ML • ൓ࣄ࣮ΛධՁɺֶश͢Δ • ޿ࠂͷόφʔදࣔ໰୊͸యܕతͳࣄྫ • Algorithms
• IPS Estimator • POEM • Experiments

AI LabͰ΋ݚڀڧԽத https://arxiv.org/abs/1809.03084 Yusuke Narita, Shota Yasui, Kohei Yata,  “Efﬁcient
Counterfactual Learning from Bandit Feedback”, arxiv, 2018 https://adtech.cyberagent.io/ailab/ ࠓճͷ಺༰ʹؔ͢Δ࿦จ ৄ͘͠͸ฐࣾwebαΠτ΁

References 1. SIGIR 2016 Tutorial on Counterfactual Evaluation and Learning 
(http://www.cs.cornell.edu/~adith/CfactSIGIR2016/)  2. ICML2017: Tutorial on Real World Interactive Learning   (http://hunch.net/~rwil/)  3. όϯσΟοτΞϧΰϦζϜͷධՁͱҼՌਪ࿦  (https://adtech.cyberagent.io/research/archives/199)  4. Counterfactual Reasoning and Learning Systems, 2017  (https://arxiv.org/abs/1209.2355)  5. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback  (https://arxiv.org/abs/1502.02362)

Counterfactual Machine Learning 入門 / Introducti...

Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML

Kazuki Taniguchi

More Decks by Kazuki Taniguchi

Other Decks in Technology

Featured

Transcript

Counterfactual Machine Learningೖ໳ גࣜձࣾαΠόʔΤʔδΣϯτ ΞυςΫຊ෦ɹAI Lab Kazuki Taniguchi

Introduction • ৬छ  ɹResearch Scientist • ݚڀྖҬ • Basics of

What is Counterfactual ML? ※Counterfactual ML (Machine Learning) [1]

Counterfactual ML • ʮ൓ࣄ࣮͕ੜ͡Δσʔλʹର͢ΔΞϧΰϦζϜͷධՁɺ ͋Δ͍͸ϞσϧΛֶश͢ΔΞϧΰϦζϜʯͱఆٛ͢Δ  (ݫີͳఆٛͰ͋Δอূ͸ͳ͍) • ൓ࣄ࣮͕ੜ͡ΔσʔλΛѻ͏ΞϧΰϦζϜ • Interactive

Supervised Learning Feature(Context): xi 1 9 8 7 3 2

• Ϣʔβʹ޿ࠂը૾Λදࣔͤ͞Δ • Ϣʔβͱ഑৴໘͸Context(೔࣌, ੑผ, τϐοΫ, etc…)Λ͍࣋ͬͯΔ • ޿ࠂը૾͸ީิͷத͔ΒҰͭͷը૾͚ͩදࣔ͞ΕΔ •

Interactive Learning [2] Feature(Context): xi ai = π(xi ) Action:

Comparison with Supervised Learning 1 7 Labels Supervised Learning Interactive

Comparison with Contextual Bandit • ໰୊ઃఆ͸ಉ͡ • Counterfactual ML͸Ofﬂine (Batch)

Algorithms

Deﬁnitions • Data • Policy D = ((x1 , y1

Counterfactual Risk Minimization • Unbiased Estimation R(π) = 1 n

Counterfactual Risk Minimization arg min h R(h) + λ Varw

• classiﬁcationͱಉ༷ͷpolicy (ઢܗ + softmax) • ҎԼͷࣜͷ௨Γʹֶश POEM [5] π(y|x)

Experiments • Dataset (multi label experiments) • Supervised to Bandit

Experimental Results [5] • Test set Hamming Loss • Computational

Note • logʹଘࡏ͠ͳ͍ϥϕϧʹରͯ͠͸ਖ਼֬ͳ༧ଌ͸Ͱ͖ͳ͍ ex) ৽͍͠޿ࠂΛ௥Ճ͢Δ࣌ log A B C B

More • [5]ͷݚڀνʔϜ͕ܧଓతʹݚڀΛൃද • ”The Self-Normalized Estimator for Counterfactual Learning”

Summary

Summary • Counterfactual ML • ൓ࣄ࣮ΛධՁɺֶश͢Δ • ޿ࠂͷόφʔදࣔ໰୊͸యܕతͳࣄྫ • Algorithms

AI LabͰ΋ݚڀڧԽத https://arxiv.org/abs/1809.03084 Yusuke Narita, Shota Yasui, Kohei Yata,  “Efﬁcient

ﬁn.

References 1. SIGIR 2016 Tutorial on Counterfactual Evaluation and Learning