CFML関連のライブラリの紹介 / cfml #3 libraries

CFMLؔ࿈ͷϥΠϒϥϦͷ঺հ Kazuki Taniguchi

• ৬ྺ • 2014.4-2019.3 • גࣜձࣾαΠόʔΤʔδΣϯτΞυςΫຊ෦ AI Lab  Research Scientist
/ MLΤϯδχΞ / Data Scientist • 2019.4- • ITܥϕϯνϟʔ (ϓϩμΫτ։ൃ/ϚʔέςΟϯά) • ϑϦʔϥϯε (AI/MLͷݚڀ։ൃ) • ݚڀ෼໺ • Pattern Recognition / Image Restoration • Recommendation / Response Prediction • Counterfactual Machine Learning ࣗݾ঺հ ୩ޱ ࿨ً (@kazk1018)

ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)
• Vowpal Wabbit (OSS)

• Vowpal Wabbit (OSS) Causal Inference (uplift modeling) Contextual Bandit (ML)

Causal Inference Libraries

• ໰୊ઃఆ • ؍ଌ͞ΕΔoutcome͸࣍ͷΑ͏ʹදͤΔ Causal Inference : ಛ௃ϕΫτϧ xi ∈
X : հೖͷׂ౰ Ti ∈ T = {0,1} : potential outcome Y(T) i ∈ ℝ Yi = Ti Y(1) i + (1 − Ti )Y(1) i

Treatment Effects • Average Treatment Effect (ATE) • Conditional Average
Treatment Effect (CATE) τ = [Y(1) − Y(0)] τ(x) = [Y(1) − Y(0) |X = x]

Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing • ໨త • ׂҾՁ֨ͰΦϑΝʔ͢Δ͜ͱͰߪೖଅਐΛߦ͏ • ׂҾ෼͸ߪೖ਺͕૿͑Δ͜ͱͰ࠾ࢉΛ߹Θ͍ͤͨ
Treatment: Outcome: ΦϑΝʔΛग़͔͢Ͳ͏͔ ߪೖ͢Δ͔Ͳ͏͔ ໰୊ઃఆ ׂҾՁ֨ʹΑΔࢪࡦͷҼՌޮՌΛݟ͍ͨ

Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing ͜ͷࢪࡦ͸શମΛ௨ͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ ͜ͷࢪࡦ͸୭ʹରͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ → ATE
→ CATE ҼՌޮՌΛ஌Δ͜ͱͰࢪࡦͷޮՌΛଌΕΔ

DoWhy • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • ҼՌਪ࿦ʹ͓͚ΔԾఆΛνΣοΫ͠ͳ͕Β࠷ऴతʹਪఆ·ͰΛ ߦ͏ (backdoor, IV) • άϥϑ(DAG)ΛࣗΒೖྗ͢Δඞཁ͕͋Δ
• ҼՌਪ࿦ΛॳΊֶͯͿਓ͕ؒखΛಈ͔͠ͳ͕Βֶश͢Δͷʹ࠷ దͳϥΠϒϥϦͱ͍͏ҹ৅

EconML • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷϞσϧ͕ෳ਺༻ҙ͞Ε͓ͯΓɺ͔ͳΓ ࠷ۙൃද͞Ε͍ͯΔ΋ͷ΋͋Δ (ޙड़) • DoWhyͱൺֱ͢Δͱ؍ଌσʔλͷΈ͔ΒҼՌਪ࿦Λਪఆ͢Δ λεΫʹ͓͍ͯΑΓ࣮༻తͳϥΠϒϥϦ

CausalML • Uber͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷೋͭͷϞσϧ͕࣮૷͞Ε͍ͯΔ (ޙड़) • EconMLͱ໾ׂͱͯ͠͸΄ͱΜͲಉ͡

EconMLͱCausalMLͷҧ͍ EconML CausalML estimator.ﬁt(Y, T, X, W) estimator.estimate_ate(Y, T, X)
Y: Outcome T: Treatment X: Features W: Controls Y: Outcome T: Treatment X: Features EconML͸CATEʹ͓͚ΔConditionͱͳΔX(Features)ͱ ͦΕҎ֎ͷಛ௃(Controls)Λ෼͚ΔԾఆΛஔ͍͍ͯΔ

Algorithms DoWhy EconML CausalML Basic Algorithms (Matching, IV, RD) ̋
Deep IV [1] ̋ Double Machine Learning [2] ̋ Orthogonal Random Forests [3] ̋ Meta-Learners [4] ̋ ̋ Uplift Tree [5] ̋

Summary ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ → EconML / CausalML → DoWhy

Contextual Bandit Libraries

Contextual Bandit • ໰୊ઃఆ • ΞʔϜΛબ୒͢ΔํࡦΛ࣍ͷΑ͏ʹఆٛ͢Δ : ಛ௃ϕΫτϧ xt ∈
X : ʹબ୒ͨ͠ΞʔϜ at t ∈ A = {a1 , . . , aK } : ΛબΜͰಘΒΕΔใु rat at ∈ ℝ at ∼ π(xt )

Contextual Bandit • ࠷దͳarmΛબͼଓ͚ΔํࡦΛ ͱ͢ΔͱɺRegret͸࣍ͷ Α͏ʹදͤΔ • RegretΛ࠷খʹ͢ΔํࡦΛݟ͚͍ͭͨ π*(x) R(π,
T) = [ T ∑ t=1 rt,a*] − [ T ∑ t=1 rt,π(x)]

Vowpal Wabbit (VW) • C++੡ͷCLIͰಈ͘ػցֶशϥΠϒϥϦ • جຊతͳػցֶशͱόϯσΟοτ͕༻ҙ͞Ε͍ͯΔ • PythonόΠϯσΟϯά΋͋Γ·͢

Approaches • Inverse Propensity Score [6] • Doubly Robust Estimator
[7] • Direct Method • ୯७ͳใुʹؔ͢Δճؼ (biased) • Multi Task Regression[8]

Exploration • ਪ࿦࣌ʹબͿΞΫγϣϯʹ୳ࡧΛؚΊΔ͔Ͳ͏͔ͷΦ ϓγϣϯ͕͋Δ • ୳ࡧΛར༻͢Δ৔߹͸ࣄલʹΞʔϜͷ࠷େ਺Λ஌Δඞཁ ͕͋Δ (ֶश࣌ʹҾ਺ͱͯ͠ೖྗ͢Δ) • ΞʔϜͷ৘ใ͕มԽ͢ΔՄೳੑ͕͋Δ৔߹͸ֶश࣌ͷೖ
ྗϑΝΠϧͷϑΥʔϚοτΛมߋ͢Δඞཁ͕͋Δ(adf)

Input File Format "DUJPO$PTU1SPCBCJMJUZc\'FBUVSFT^ cBD cCE cBCD cCD cBE ྫ)
train.dat Format

Input File Format ςΩετ ಛ௃ϕΫτϧ ΞΫγϣϯ ίετ ֬཰ cBD <
> cCE < > cBCD \B C D^ cBC cBCD \B C^ \B C D^ ͸ແࢹ • ೖྗϑΥʔϚοτͱಛ௃ͷྫ ※(adfͷͱ͖) ෳ਺ߦͰΞʔϜΛදݱɺબ୒͞ΕͨΞʔϜʹίετͱ֬཰Λهड़͢Δ ※

Example ͖͞΄ͲͷϑΥʔϚοτ௨Γʹ ςΩετΛ࡞ͬͯϝιουʹೖΕ͍͚ͯͩ͘ (͍͔ʹ΋C++ଆʹ߹ΘͤͨPythonόΠϯσΟϯά…) Training

Example Prediction Save & Load Model

Summary

Summary • ҼՌਪ࿦ͷϥΠϒϥϦͷ঺հ • ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ • DoWhy  • ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ •
EconML / CausalML • Contextual BanditͷϥΠϒϥϦͷ঺հ • Vowpal Wabbit

CFML JP Slackάϧʔϓ ্هͷQRίʔυ͔ΒࢀՃͷ΄Ͳ͓ٓ͘͠ئ͍͠·͢

References

Reference 1. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt
Taddy, "Deep IV: A flexible approach for counterfactual prediction”, Proceedings of the 34th International Conference on Machine Learning, 2017. 2. Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins, “Double/Debiased Machine Learning for Treatment and Structural Parameters”, Econometrics Journal, 21, pp.C1–C68. 3. M. Oprescu, V. Syrgkanis and Z. S. Wu, "Orthogonal Random Forest for Causal Inference”, Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. 4. Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu, "Meta-learners for estimating heterogeneous treatment effects using machine learning”, arXiv preprint arXiv:1706.03461, 2017. 5. Piotr Rzepakowski and Szymon Jaroszewicz, "Decision trees for uplift modeling with single and multiple treatments”, Knowl. Inf. Syst., 32(2):303–327, August 2012. 6. Horvitz, D. G., & Thompson, D. J., “A Generalization of Sampling Without Replacement from a Finite Universe”, Journal of the American Statistical Association, 47(260), 663–685 7. Dudı́k Miroslav, Langford, J., & Li, L., “Doubly Robust Policy Evaluation and Learning”, In Proceedings of the 28th International Conference on Machine Learning, Bellevue, 2011 (pp. 1097–1104) 8. Karampatziakis, N., & Langford, J.,”Online Importance Weight Aware Updates”, In Proceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence (pp. 392–399)

Reference • DoWhy • DoWhy (https://microsoft.github.io/dowhy/index.html) • ౷ܭతҼՌਪ࿦ͷͨΊͷPythonϥΠϒϥϦDoWhyʹ͍ͭͯղઆɿͳʹ͕Ͱ͖ͯɺͳʹʹ஫ҙ͢΂͖  (https://www.krsk-phs.com/entry/2018/08/22/060844) •
EconML • EconML (https://github.com/microsoft/EconML) • EconMLύοέʔδͷ঺հ (meta-learnersฤ)  (https://usaito.hatenablog.com/entry/2019/04/07/205756) • CausalML • CausalML (https://github.com/uber/causalml) • Vowpal Wabbit • Vowpal Wabbit (https://vowpalwabbit.org/index.html)

CFML関連のライブラリの紹介 / cfml #3 libraries

CFML関連のライブラリの紹介 / cfml #3 libraries

Kazuki Taniguchi

More Decks by Kazuki Taniguchi

Other Decks in Technology

Featured

Transcript

CFMLؔ࿈ͷϥΠϒϥϦͷ঺հ Kazuki Taniguchi

• ৬ྺ • 2014.4-2019.3 • גࣜձࣾαΠόʔΤʔδΣϯτΞυςΫຊ෦ AI Lab  Research Scientist

ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

Causal Inference Libraries

• ໰୊ઃఆ • ؍ଌ͞ΕΔoutcome͸࣍ͷΑ͏ʹදͤΔ Causal Inference : ಛ௃ϕΫτϧ xi ∈

Treatment Effects • Average Treatment Effect (ATE) • Conditional Average

Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing • ໨త • ׂҾՁ֨ͰΦϑΝʔ͢Δ͜ͱͰߪೖଅਐΛߦ͏ • ׂҾ෼͸ߪೖ਺͕૿͑Δ͜ͱͰ࠾ࢉΛ߹Θ͍ͤͨ

Ͳ͏͍͏৔໘Ͱར༻͞ΕΔͷ͔ • Personalized Pricing ͜ͷࢪࡦ͸શମΛ௨ͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ ͜ͷࢪࡦ͸୭ʹରͯ͠ ͲΕ͘Β͍ͷޮՌ͕͋ΔΜͩΖ͏͔ → ATE

ࠓճ঺հ͢ΔϥΠϒϥϦ • DoWhy (Microsoft) • EconML (Microsoft) • CausalML (Uber)

DoWhy • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • ҼՌਪ࿦ʹ͓͚ΔԾఆΛνΣοΫ͠ͳ͕Β࠷ऴతʹਪఆ·ͰΛ ߦ͏ (backdoor, IV) • άϥϑ(DAG)ΛࣗΒೖྗ͢Δඞཁ͕͋Δ

EconML • Microsoft͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷϞσϧ͕ෳ਺༻ҙ͞Ε͓ͯΓɺ͔ͳΓ ࠷ۙൃද͞Ε͍ͯΔ΋ͷ΋͋Δ (ޙड़) • DoWhyͱൺֱ͢Δͱ؍ଌσʔλͷΈ͔ΒҼՌਪ࿦Λਪఆ͢Δ λεΫʹ͓͍ͯΑΓ࣮༻తͳϥΠϒϥϦ

CausalML • Uber͕։ൃͨ͠Python੡ͷҼՌਪ࿦ͷϥΠϒϥϦ • CATEΛਪఆ͢ΔͨΊͷೋͭͷϞσϧ͕࣮૷͞Ε͍ͯΔ (ޙड़) • EconMLͱ໾ׂͱͯ͠͸΄ͱΜͲಉ͡

EconMLͱCausalMLͷҧ͍ EconML CausalML estimator.ﬁt(Y, T, X, W) estimator.estimate_ate(Y, T, X)

Algorithms DoWhy EconML CausalML Basic Algorithms (Matching, IV, RD) ̋

Summary ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ → EconML / CausalML → DoWhy

Contextual Bandit Libraries

Contextual Bandit • ໰୊ઃఆ • ΞʔϜΛબ୒͢ΔํࡦΛ࣍ͷΑ͏ʹఆٛ͢Δ : ಛ௃ϕΫτϧ xt ∈

Contextual Bandit • ࠷దͳarmΛબͼଓ͚ΔํࡦΛ ͱ͢ΔͱɺRegret͸࣍ͷ Α͏ʹදͤΔ • RegretΛ࠷খʹ͢ΔํࡦΛݟ͚͍ͭͨ π*(x) R(π,

Vowpal Wabbit (VW) • C++੡ͷCLIͰಈ͘ػցֶशϥΠϒϥϦ • جຊతͳػցֶशͱόϯσΟοτ͕༻ҙ͞Ε͍ͯΔ • PythonόΠϯσΟϯά΋͋Γ·͢

Approaches • Inverse Propensity Score [6] • Doubly Robust Estimator

Exploration • ਪ࿦࣌ʹબͿΞΫγϣϯʹ୳ࡧΛؚΊΔ͔Ͳ͏͔ͷΦ ϓγϣϯ͕͋Δ • ୳ࡧΛར༻͢Δ৔߹͸ࣄલʹΞʔϜͷ࠷େ਺Λ஌Δඞཁ ͕͋Δ (ֶश࣌ʹҾ਺ͱͯ͠ೖྗ͢Δ) • ΞʔϜͷ৘ใ͕มԽ͢ΔՄೳੑ͕͋Δ৔߹͸ֶश࣌ͷೖ

Input File Format "DUJPO$PTU1SPCBCJMJUZc\'FBUVSFT^ cBD cCE cBCD cCD cBE ྫ)

Input File Format ςΩετ ಛ௃ϕΫτϧ ΞΫγϣϯ ίετ ֬཰ cBD <

Example ͖͞΄ͲͷϑΥʔϚοτ௨Γʹ ςΩετΛ࡞ͬͯϝιουʹೖΕ͍͚ͯͩ͘ (͍͔ʹ΋C++ଆʹ߹ΘͤͨPythonόΠϯσΟϯά…) Training

Example Prediction Save & Load Model

Summary

Summary • ҼՌਪ࿦ͷϥΠϒϥϦͷ঺հ • ͜Ε͔ΒҼՌਪ࿦Λֶ΅͏ͱ͢Δਓ • DoWhy  • ͜Ε͔Β؍ଌσʔλΛ༻͍ͯҼՌޮՌΛਪఆ͍ͨ͠ਓ •

CFML JP Slackάϧʔϓ ্هͷQRίʔυ͔ΒࢀՃͷ΄Ͳ͓ٓ͘͠ئ͍͠·͢

References

Reference 1. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt

Reference • DoWhy • DoWhy (https://microsoft.github.io/dowhy/index.html) • ౷ܭతҼՌਪ࿦ͷͨΊͷPythonϥΠϒϥϦDoWhyʹ͍ͭͯղઆɿͳʹ͕Ͱ͖ͯɺͳʹʹ஫ҙ͢΂͖  (https://www.krsk-phs.com/entry/2018/08/22/060844) •