Pro Yearly is on sale from $80 to $50! »

CFML関連のライブラリの紹介 / cfml #3 libraries

CFML関連のライブラリの紹介 / cfml #3 libraries

CFML勉強会#3の資料です。( https://cfml.connpass.com/event/150818/ )

F2bd408c57c20505bd4202c9e28ecd59?s=128

Kazuki Taniguchi

October 30, 2019
Tweet

Transcript

  1. CFMLؔ࿈ͷϥΠϒϥϦͷ঺հ Kazuki Taniguchi

  2. • ৬ྺ • 2014.4-2019.3 • גࣜձࣾαΠόʔΤʔδΣϯτΞυςΫຊ෦ AI Lab
 Research Scientist

    / MLΤϯδχΞ / Data Scientist • 2019.4- • ITܥϕϯνϟʔ (ϓϩμΫτ։ൃ/ϚʔέςΟϯά) • ϑϦʔϥϯε (AI/MLͷݚڀ։ൃ) • ݚڀ෼໺ • Pattern Recognition / Image Restoration • Recommendation / Response Prediction • Counterfactual Machine Learning ࣗݾ঺հ ୩ޱ ࿨ً (@kazk1018)
  3. ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

    • Vowpal Wabbit (OSS)
  4. ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

    • Vowpal Wabbit (OSS) Causal Inference (uplift modeling) Contextual Bandit (ML)
  5. Causal Inference Libraries

  6. • ໰୊ઃఆ • ؍ଌ͞ΕΔoutcome͸࣍ͷΑ͏ʹදͤΔ Causal Inference : ಛ௃ϕΫτϧ xi ∈

    X : հೖͷׂ౰ Ti ∈ T = {0,1} : potential outcome Y(T) i ∈ ℝ Yi = Ti Y(1) i + (1 − Ti )Y(1) i
  7. Treatment Effects • Average Treatment Effect (ATE) • Conditional Average

    Treatment Effect (CATE) τ = [Y(1) − Y(0)] τ(x) = [Y(1) − Y(0) |X = x]
  8. Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing • ໨త • ׂҾՁ֨ͰΦϑΝʔ͢Δ͜ͱͰߪೖଅਐΛߦ͏ • ׂҾ෼͸ߪೖ਺͕૿͑Δ͜ͱͰ࠾ࢉΛ߹Θ͍ͤͨ

    Treatment: Outcome: ΦϑΝʔΛग़͔͢Ͳ͏͔ ߪೖ͢Δ͔Ͳ͏͔ ໰୊ઃఆ ׂҾՁ֨ʹΑΔࢪࡦͷҼՌޮՌΛݟ͍ͨ
  9. Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing ͜ͷࢪࡦ͸શମΛ௨ͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ ͜ͷࢪࡦ͸୭ʹରͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ → ATE

    → CATE ҼՌޮՌΛ஌Δ͜ͱͰࢪࡦͷޮՌΛଌΕΔ
  10. ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

  11. DoWhy • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • ҼՌਪ࿦ʹ͓͚ΔԾఆΛνΣοΫ͠ͳ͕Β࠷ऴతʹਪఆ·ͰΛ ߦ͏ (backdoor, IV) • άϥϑ(DAG)ΛࣗΒೖྗ͢Δඞཁ͕͋Δ

    • ҼՌਪ࿦ΛॳΊֶͯͿਓ͕ؒखΛಈ͔͠ͳ͕Βֶश͢Δͷʹ࠷ దͳϥΠϒϥϦͱ͍͏ҹ৅
  12. EconML • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷϞσϧ͕ෳ਺༻ҙ͞Ε͓ͯΓɺ͔ͳΓ ࠷ۙൃද͞Ε͍ͯΔ΋ͷ΋͋Δ (ޙड़) • DoWhyͱൺֱ͢Δͱ؍ଌσʔλͷΈ͔ΒҼՌਪ࿦Λਪఆ͢Δ λεΫʹ͓͍ͯΑΓ࣮༻తͳϥΠϒϥϦ

  13. CausalML • Uber͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷೋͭͷϞσϧ͕࣮૷͞Ε͍ͯΔ (ޙड़) • EconMLͱ໾ׂͱͯ͠͸΄ͱΜͲಉ͡

  14. EconMLͱCausalMLͷҧ͍ EconML CausalML estimator.fit(Y, T, X, W) estimator.estimate_ate(Y, T, X)

    Y: Outcome T: Treatment X: Features W: Controls Y: Outcome T: Treatment X: Features EconML͸CATEʹ͓͚ΔConditionͱͳΔX(Features)ͱ ͦΕҎ֎ͷಛ௃(Controls)Λ෼͚ΔԾఆΛஔ͍͍ͯΔ
  15. Algorithms DoWhy EconML CausalML Basic Algorithms (Matching, IV, RD) ̋

    Deep IV [1] ̋ Double Machine Learning [2] ̋ Orthogonal Random Forests [3] ̋ Meta-Learners [4] ̋ ̋ Uplift Tree [5] ̋
  16. Summary ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ → EconML / CausalML → DoWhy

  17. Contextual Bandit Libraries

  18. Contextual Bandit • ໰୊ઃఆ • ΞʔϜΛબ୒͢ΔํࡦΛ࣍ͷΑ͏ʹఆٛ͢Δ : ಛ௃ϕΫτϧ xt ∈

    X : ʹબ୒ͨ͠ΞʔϜ at t ∈ A = {a1 , . . , aK } : ΛબΜͰಘΒΕΔใु rat at ∈ ℝ at ∼ π(xt )
  19. Contextual Bandit • ࠷దͳarmΛબͼଓ͚ΔํࡦΛ ͱ͢ΔͱɺRegret͸࣍ͷ Α͏ʹදͤΔ • RegretΛ࠷খʹ͢ΔํࡦΛݟ͚͍ͭͨ π*(x) R(π,

    T) = [ T ∑ t=1 rt,a*] − [ T ∑ t=1 rt,π(x)]
  20. Vowpal Wabbit (VW) • C++੡ͷCLIͰಈ͘ػցֶशϥΠϒϥϦ • جຊతͳػցֶशͱόϯσΟοτ͕༻ҙ͞Ε͍ͯΔ • PythonόΠϯσΟϯά΋͋Γ·͢

  21. Approaches • Inverse Propensity Score [6] • Doubly Robust Estimator

    [7] • Direct Method • ୯७ͳใुʹؔ͢Δճؼ (biased) • Multi Task Regression[8]
  22. Exploration • ਪ࿦࣌ʹબͿΞΫγϣϯʹ୳ࡧΛؚΊΔ͔Ͳ͏͔ͷΦ ϓγϣϯ͕͋Δ • ୳ࡧΛར༻͢Δ৔߹͸ࣄલʹΞʔϜͷ࠷େ਺Λ஌Δඞཁ ͕͋Δ (ֶश࣌ʹҾ਺ͱͯ͠ೖྗ͢Δ) • ΞʔϜͷ৘ใ͕มԽ͢ΔՄೳੑ͕͋Δ৔߹͸ֶश࣌ͷೖ

    ྗϑΝΠϧͷϑΥʔϚοτΛมߋ͢Δඞཁ͕͋Δ(adf)
  23. Input File Format "DUJPO$PTU1SPCBCJMJUZc\'FBUVSFT^ cBD cCE cBCD cCD cBE ྫ)

    train.dat Format
  24. Input File Format ςΩετ ಛ௃ϕΫτϧ ΞΫγϣϯ ίετ ֬཰ cBD <

      >    cCE <   >    cBCD \B C D^    cBC cBCD \B C^ \B C D^  ͸ແࢹ   • ೖྗϑΥʔϚοτͱಛ௃ͷྫ ※(adfͷͱ͖) ෳ਺ߦͰΞʔϜΛදݱɺબ୒͞ΕͨΞʔϜʹίετͱ֬཰Λهड़͢Δ ※
  25. Example ͖͞΄ͲͷϑΥʔϚοτ௨Γʹ ςΩετΛ࡞ͬͯϝιουʹೖΕ͍͚ͯͩ͘ (͍͔ʹ΋C++ଆʹ߹ΘͤͨPythonόΠϯσΟϯά…) Training

  26. Example Prediction Save & Load Model

  27. Summary

  28. Summary • ҼՌਪ࿦ͷϥΠϒϥϦͷ঺հ • ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ • DoWhy
 • ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ •

    EconML / CausalML • Contextual BanditͷϥΠϒϥϦͷ঺հ • Vowpal Wabbit
  29. CFML JP Slackάϧʔϓ ্هͷQRίʔυ͔ΒࢀՃͷ΄Ͳ͓ٓ͘͠ئ͍͠·͢

  30. References

  31. Reference 1. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt

    Taddy, "Deep IV: A flexible approach for counterfactual prediction”, Proceedings of the 34th International Conference on Machine Learning, 2017. 2. Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins, “Double/Debiased Machine Learning for Treatment and Structural Parameters”, Econometrics Journal, 21, pp.C1–C68. 3. M. Oprescu, V. Syrgkanis and Z. S. Wu, "Orthogonal Random Forest for Causal Inference”, Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. 4. Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu, "Meta-learners for estimating heterogeneous treatment effects using machine learning”, arXiv preprint arXiv:1706.03461, 2017. 5. Piotr Rzepakowski and Szymon Jaroszewicz, "Decision trees for uplift modeling with single and multiple treatments”, Knowl. Inf. Syst., 32(2):303–327, August 2012. 6. Horvitz, D. G., & Thompson, D. J., “A Generalization of Sampling Without Replacement from a Finite Universe”, Journal of the American Statistical Association, 47(260), 663–685 7. Dudı́k Miroslav, Langford, J., & Li, L., “Doubly Robust Policy Evaluation and Learning”, In Proceedings of the 28th International Conference on Machine Learning, Bellevue, 2011 (pp. 1097–1104) 8. Karampatziakis, N., & Langford, J.,”Online Importance Weight Aware Updates”, In Proceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence (pp. 392–399)
  32. Reference • DoWhy • DoWhy (https://microsoft.github.io/dowhy/index.html) • ౷ܭతҼՌਪ࿦ͷͨΊͷPythonϥΠϒϥϦDoWhyʹ͍ͭͯղઆɿͳʹ͕Ͱ͖ͯɺͳʹʹ஫ҙ͢΂͖
 (https://www.krsk-phs.com/entry/2018/08/22/060844) •

    EconML • EconML (https://github.com/microsoft/EconML) • EconMLύοέʔδͷ঺հ (meta-learnersฤ)
 (https://usaito.hatenablog.com/entry/2019/04/07/205756) • CausalML • CausalML (https://github.com/uber/causalml) • Vowpal Wabbit • Vowpal Wabbit (https://vowpalwabbit.org/index.html)