Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML

F2bd408c57c20505bd4202c9e28ecd59?s=47 Kazuki Taniguchi
September 29, 2018

Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML

この資料は「第28回Machine Learning 15minutes! 」(https://machine-learning15minutes.connpass.com/event/97195/) で発表した内容になります。

F2bd408c57c20505bd4202c9e28ecd59?s=128

Kazuki Taniguchi

September 29, 2018
Tweet

Transcript

  1. Counterfactual Machine Learningೖ໳ גࣜձࣾαΠόʔΤʔδΣϯτ ΞυςΫຊ෦ɹAI Lab Kazuki Taniguchi

  2. Introduction • ৬छ
 ɹResearch Scientist • ݚڀྖҬ • Basics of

    Machine Learning • Response Prediction • Counterfactual ML • ͜Ε·Ͱͷ࢓ࣄ (ResearchҎ֎) • MLaaSͷ։ൃ • DSPͷΞϧΰϦζϜ։ൃ
  3. What is Counterfactual ML? ※Counterfactual ML (Machine Learning) [1]

  4. Counterfactual ML • ʮ൓ࣄ࣮͕ੜ͡Δσʔλʹର͢ΔΞϧΰϦζϜͷධՁɺ ͋Δ͍͸ϞσϧΛֶश͢ΔΞϧΰϦζϜʯͱఆٛ͢Δ
 (ݫີͳఆٛͰ͋Δอূ͸ͳ͍) • ൓ࣄ࣮͕ੜ͡ΔσʔλΛѻ͏ΞϧΰϦζϜ • Interactive

    Learning ← (ࠓճ͸͜ͷྫͰઆ໌͢Δ) • (Contextual) Bandit Algorithm • Reinforcement Learning • Covariant Shift
  5. Supervised Learning Feature(Context): xi 1 9 8 7 3 2

    Predictions: ̂ yi ̂ yi = f(xi ) 1 9 5 7 3 2 Labels: yi miss correct correct correct correct correct
  6. • Ϣʔβʹ޿ࠂը૾Λදࣔͤ͞Δ • Ϣʔβͱ഑৴໘͸Context(೔࣌, ੑผ, τϐοΫ, etc…)Λ͍࣋ͬͯΔ • ޿ࠂը૾͸ީิͷத͔ΒҰͭͷը૾͚ͩදࣔ͞ΕΔ •

    දࣔ͞ΕΔ޿ࠂ͕ΫϦοΫ͞ΕΔΑ͏ʹ഑৴͍ͨ͠ Problem Setting Ad Selection ഑৴໘ Ϣʔβ π(x)
  7. Interactive Learning [2] Feature(Context): xi ai = π(xi ) Action:

    ai Reward: ri ഑৴໘ Ϣʔβ Click or Not Ϣʔβ
  8. Comparison with Supervised Learning 1 7 Labels Supervised Learning Interactive

    Learning click Counterfactual • બ୒͞Εͳ͔ͬͨΞΫγϣϯͷධՁ͸൓ࣄ࣮ͱͳΔ • ৽͍͠PolicyΛධՁ͢Δࡍ͸൓ࣄ࣮ͷΞΫγϣϯΛධՁͰ͖ͳ͍
  9. Comparison with Contextual Bandit • ໰୊ઃఆ͸ಉ͡ • Counterfactual ML͸Offline (Batch)

    LearningΛϝΠϯʹऔΓѻ͏ • Onlineͱҧ͍ɺධՁ͕ߦ͍΍͍͢఺͕ϝϦοτͱͳΔ • Contextual Bandit͸OnlineͰPolicyΛߋ৽͢Δ • Counterfactual MLͷߟ͑ํ͸Contextual BanditͷPolicyͷ
 ධՁΛ͢Δ͜ͱͱಉ͡ (Offline Evaluation) [3] Evaluationʹ͍ͭͯ͸ʮAI Lab Research Blogʯͷهࣄ[3]ʹ ৄ͘͠ॻ͔Ε͍ͯΔͷͰࠓճͷൃදͰ͸ׂѪ͠·͢
  10. Algorithms

  11. Definitions • Data • Policy D = ((x1 , y1

    , δ1 , p1 ), . . . , (xn , yn , δn , pn )) xi yi δi pi yi = π(xi ) π : Context : Labels (multi-label settings) : Reward : Propensity Score (ޙड़) : Policy (Context → Action)
  12. Counterfactual Risk Minimization • Unbiased Estimation R(π) = 1 n

    n ∑ i=1 δi π(yi |xi ) π0 (yi |xi ) = 1 n n ∑ i=1 δi π(yi |xi ) pi δi π0 : loss : logging policy (→ Propensity Score) Importance sampling R(π) = 1 n n ∑ i=1 min{M, δi π(yi |xi ) pi } clipping (M)Λಋೖͨ͠Լه͕IPS (Inverse Propensity Score) Estimator [4]
  13. Counterfactual Risk Minimization arg min h R(h) + λ Varw

    (u) n • CRM (Counterfactual Risk Minimization) Generalization Error Boundsͷ্ݶΛ࠷খʹ͢Δ ※ৄࡉ͸࿦จΛࢀর data-dependent regularizer
  14. • classificationͱಉ༷ͷpolicy (ઢܗ + softmax) • ҎԼͷࣜͷ௨Γʹֶश POEM [5] π(y|x)

    = exp(wϕ(x, y)) ∑ y′∈Y exp(wϕ(x, y′) w * = arg min w∈Rd ¯ uw + λ Varw (u) n ui w ≡ δi min{M, exp(wϕ(x, y)) pi ∑ y′∈Y exp(wϕ(x, y′) } ¯ uw ≡ n ∑ i=1 ui w Varw (u) ≡ 1 n − 1 n ∑ i=0 (ui w − ¯ uw )2
  15. Experiments • Dataset (multi label experiments) • Supervised to Bandit

    Conversion [6] 5% 95% x y* CRF π0 y ᶃશσʔλͷ5%Ͱlogging policyΛֶश ᶄಘΒΕͨlogging policyͰ95%ͷσʔλʹϥϕϧΛ෇༩ ᶅ feedbackΛyͱy*Ͱܭࢉ
 (Hamming loss) δ
  16. Experimental Results [5] • Test set Hamming Loss • Computational

    time (seconds) S: AdaGrad, B: L-BFGS
  17. Note • logʹଘࡏ͠ͳ͍ϥϕϧʹରͯ͠͸ਖ਼֬ͳ༧ଌ͸Ͱ͖ͳ͍ ex) ৽͍͠޿ࠂΛ௥Ճ͢Δ࣌ log A B C B

    A OK NG Counterfactual ML ※্هͷྫ͸ۃ୺ͳྫɺՄೳʹ͢Δํ๏΋ଘࡏ͢Δ
  18. More • [5]ͷݚڀνʔϜ͕ܧଓతʹݚڀΛൃද • ”The Self-Normalized Estimator for Counterfactual Learning”

    • “Recommendations as Treatments: Debiasing Learning and Evaluation” • “Unbiased Learning-to-Rank with Biased Feedback” • “Deep Learning with Logged Bandit Feedback” • Microsoft Researchʹ΋ଟ͘ͷݚڀऀ͕ࡏ੶த ڵຯͷ͋Δํ͸ͥͻௐ΂ͯΈ͍ͯͩ͘͞ʂ
  19. Summary

  20. Summary • Counterfactual ML • ൓ࣄ࣮ΛධՁɺֶश͢Δ • ޿ࠂͷόφʔදࣔ໰୊͸యܕతͳࣄྫ • Algorithms

    • IPS Estimator • POEM • Experiments
  21. AI LabͰ΋ݚڀڧԽத https://arxiv.org/abs/1809.03084 Yusuke Narita, Shota Yasui, Kohei Yata,
 “Efficient

    Counterfactual Learning from Bandit Feedback”, arxiv, 2018 https://adtech.cyberagent.io/ailab/ ࠓճͷ಺༰ʹؔ͢Δ࿦จ ৄ͘͠͸ฐࣾwebαΠτ΁
  22. fin.

  23. References 1. SIGIR 2016 Tutorial on Counterfactual Evaluation and Learning


    (http://www.cs.cornell.edu/~adith/CfactSIGIR2016/)
 2. ICML2017: Tutorial on Real World Interactive Learning 
 (http://hunch.net/~rwil/)
 3. όϯσΟοτΞϧΰϦζϜͷධՁͱҼՌਪ࿦
 (https://adtech.cyberagent.io/research/archives/199)
 4. Counterfactual Reasoning and Learning Systems, 2017
 (https://arxiv.org/abs/1209.2355)
 5. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
 (https://arxiv.org/abs/1502.02362)