機械学習と解釈可能性

22f56e55955b9aa693081ed5dc6400ae?s=47 Sinhrks
January 22, 2018
4.8k

 機械学習と解釈可能性

22f56e55955b9aa693081ed5dc6400ae?s=128

Sinhrks

January 22, 2018
Tweet

Transcript

  1. ػցֶशͱղऍՄೳੑ Masaaki Horikoshi @ ARISE analytics

  2. ࣗݾ঺հ • R • ύοέʔδ։ൃͳͲ • Git Awards ࠃ಺1Ґ •

    Python • http://git-awards.com/users/search?login=sinhrks
  3. Interpretability ղऍՄೳੑ

  4. ղऍՄೳੑͱ͸ • ໌֬ͳఆٛ͸ͳ͍͕ • Ϟσϧͷग़ྗ (what) ͚ͩͰͳ͘ɺͦͷཧ༝ (why) Λઆ໌͢Δ͜ͱ •

    ͦͷͨΊʹɺͳΜΒ͔ͷख๏/ج४Λ༻͍ͯϞσϧΛղऍ͢Δ͜ͱ The Mythos of Model Interpretability (Lipton, 2016)
  5. ͳʹ͕خ͍͔͠ • Trust • Causality • Transferability • Informativeness •

    Fair and Ethical Decision Making The Mythos of Model Interpretability (Lipton, 2016)
  6. ղऍͷͨΊͷΞϓϩʔν 1. આ໌͠΍͍͢ػցֶशख๏ΛબͿ • ਫ਼౓͕ෆे෼ͳ৔߹͕͋Δ 2. ͳΜΒ͔ͷղऍख๏Λ࢖͏

  7. ղऍՄೳੑ • Global Interpretability • Ϟσϧ΍σʔλશମͷ܏޲Λղऍ • ۙࣅ΍ཁ໿౷ܭྔΛར༻ => ہॴతʹ͸ෆਖ਼֬ͳ৔߹΋

    • Local Interpretability • Ϟσϧ΍σʔλͷݶΒΕͨྖҬΛղऍ • ΑΓਖ਼֬ͳઆ໌͕Մೳ
  8. ղऍՄೳੑ • ద੾ͳख๏͸ʮԿΛʯղऍ͍͔ͨ͠ʹґଘ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w

    'FBUVSF*NQPSUBODF ʜ w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ
  9. Regression Coefficients • ඪ४Խภճؼ܎਺ library(dplyr) library(mlbench) data(BostonHousing) df <- BostonHousing

    %>% mutate_if(is.factor, as.numeric) %>% mutate_at(-14, scale) head(df, 3) crim zn indus chas nox rm age 1 -0.4193669 0.2845483 -1.2866362 -0.2723291 -0.1440749 0.4132629 -0.1198948 2 -0.4169267 -0.4872402 -0.5927944 -0.2723291 -0.7395304 0.1940824 0.3668034 3 -0.4169290 -0.4872402 -0.5927944 -0.2723291 -0.7395304 1.2814456 -0.2655490 dis rad tax ptratio b lstat medv 1 0.140075 -0.9818712 -0.6659492 -1.4575580 0.4406159 -1.0744990 24.0 2 0.556609 -0.8670245 -0.9863534 -0.3027945 0.4406159 -0.4919525 21.6 3 0.556609 -0.8670245 -0.9863534 -0.3027945 0.3960351 -1.2075324 34.7
  10. Regression Coefficients • ඪ४Խภճؼ܎਺ • coefplot ύοέʔδͰภճؼ܎਺ͷՄࢹԽ ͕Մೳ lm.fit <-

    lm(medv ~ ., data = df) coef(lm.fit) (Intercept) crim zn indus chas nox rm 22.53280632 -0.92906457 1.08263896 0.14103943 0.68241438 -2.05875361 2.67687661 age dis rad tax ptratio b lstat 0.01948534 -3.10711605 2.66485220 -2.07883689 -2.06264585 0.85010886 -3.74733185
  11. • coefplot Regression Coefficients library(coefplot) coefplot(lm.fit) σʔλͷෆඋΛ౷ܭతʹݟൈ͘ (Gelman's Secret Weapon)

    http://d.hatena.ne.jp/hoxo_m/20150617/p1
  12. • Feature Importance (Random Forest) Feature Importance library(dplyr) library(mlbench) data(Sonar)

    head(Sonar, 3) V1 V2 V3 V4 V5 V6 V7 V8 V9 1 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 2 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 3 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 … V55 V56 V57 V58 V59 V60 Class 1 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R 2 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R 3 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
  13. • Feature Importance (Random Forest) Feature Importance library(caret) ca.fit <-

    train(Class ~ ., data = Sonar, method = "rf", ntree = 100) varImp(ca.fit) rf variable importance only 20 most important variables shown (out of 60) Overall V11 100.00 V48 96.25 V45 75.45 V13 71.74 V10 60.50 …
  14. • Feature Importance (Random Forest) Feature Importance plot(varImp(ca.fit), top =

    10)
  15. • caretͷvarImp͸ɺطఆͰख๏ґଘͷܭࢉํ๏Λར༻ • help(varImp) Feature Importance randomForest::importance(ca.fit$finalModel) %>% as.data.frame %>%

    tibble::rownames_to_column(var = 'variable') %>% arrange(desc(MeanDecreaseGini)) %>% head(20) varImp(ca.fit, scale = F) Overall V11 3.793 V48 3.684 V45 3.078 V13 2.970 V10 2.643 … variable MeanDecreaseGini 1 V11 3.792926 2 V48 3.683850 3 V45 3.077996 4 V13 2.970039 5 V10 2.642881 …
  16. Feature Importance • randomForest::importance Ͱ͸ҎԼͷ૊Έ߹ ΘͤͰಛ௃ྔͷॏཁ౓Λܭࢉ $MBTTJpDBUJPO 3FHSFTTJPO .FBOEFDSFBTFJOBDDVSBDZDPNQVUFE GSPNQFSNVUJOH00#EBUB

    &SSPS3BUF .FBO%FDSFBTF"DDVSBDZ .4& *OD.4& .FBOEFDSFBTFJOOPEFJNQVSJUZ (JOJJOEFY .FBO%FDSFBTF(JOJ 344 *OD/PEF1VSJUZ
  17. OOB αϯϓϦϯά (෮ݩநग़) … OOB ࢀߟ: OOBσʔλ͔Βͷಛ௃ྔ ͷॏཁ౓ͷܭࢉ ֶश ֶश

    OOB ༧ଌ OOB ༧ଌ … ֤ྻͷ஋Λγϟοϑϧ͠ɺOOBޡΓ཰ͱൺֱ ༧ଌ ൺֱ … ༧ଌ ൺֱ
  18. • caret ͸Ϟσϧґଘ͠ͳ͍ॏཁ౓΋αϙʔτ Feature Importance (Model Agnostic) varImp(ca.fit, useModel =

    F) ROC curve variable importance only 20 most important variables shown (out of 60) Importance V11 100.00 V12 86.30 V10 82.64 V49 82.12 V9 81.97 …
  19. • Classification • આ໌ม਺͝ͱʹROC෼ੳ • ଟΫϥεͷ৔߹͸2Ϋϥεͷ૊Έ߹Θͤ͝ͱ • Regression • આ໌ม਺͝ͱʹઢܗճؼ/LOESS

    Feature Importance (Model Agnostic)
  20. ղऍՄೳੑ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ

    w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ
  21. Surrogate Models • ෳࡶͳϞσϧΛ୅ସ͢ΔγϯϓϧͳϞσϧΛ ௐ΂Δ Ideas on interpreting machine learning

    https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning Figure 14. An illustration of surrogate models for explaining a complex neural network. Figure courtesy of Patrick Hall and the H2O.ai team.
  22. Sensitivity Analysis • ͋Δಛ௃ྔΛมԽͤͨ࣌͞ͷϞσϧͷग़ྗΛ ௐ΂Δ Sensitivity analysis for neural networks

    https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/
  23. Partial Dependence • ͋Δಛ௃ྔΛมԽͤͨ࣌͞ͷϞσϧͷग़ྗΛ ௐ΂Δ library(pdp) library(ggplot2) pd <- partial(ca.fit,

    ɹɹpred.var = c("V11", “V9"), ɹchull = TRUE) autoplot(pd, contour = TRUE)
  24. ղऍՄೳੑ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ

    w 1BSUJBM%FQFOEFODF w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ
  25. LIMEͱ͸ʁ

  26. Local Interpretable Model-agnostic Explanations

  27. • ͋ΔσʔλͷۙลͰݩͷϞσϧΛ୅ସ͢ΔϞσϧΛ࡞੒ˠͦͷϞσϧͷ ಛ௃Λௐ΂Δ • Local Surrogate Model “Why Should I

    Trust You?” Explaining the Predictions of Any Classifier (Ribeiro, Singh, Guestrin 2016) LIME
  28. LIME • LIME͸ҎԼͷؔ਺Λ΋ͱʹσʔλ x ͷղऍΛಘΔ • G: ղऍ༻ͷֶशثͷू߹ • L:

    ղऍֶ͍ͨ͠शثͱղऍ༻ͷֶशثͷ ΠxͷݩͰͷࠩ • f: ղऍֶ͍ͨ͠शث • Πx: σʔλ x ͱͷྨࣅ౓ • Ω: ղऍ༻ͷֶशثͷෳࡶ͞ʹର͢Δേଇ߲ • ۩ମతखஈ͸υϝΠϯʹґଘ
  29. ςʔϒϧσʔλɾ෼ྨ໰୊ͷྫ • σʔλ x ͷपลͰαϯϓϦϯά • طఆͰ5,000 • αϯϓϦϯάํ๏͸ม਺ͷछྨʹґଘ •

    Exponential KernelͰॏΈ෇͚ • ม਺બ୒ • Forward/Backward, LARSͳͲ • RidgeճؼͳͲ ˞1ZUIPO࣮૷ ޙड़ ʹ΋ͱͮ͘
  30. ύοέʔδ • Python • ࿦จஶऀ࡞ • https://github.com/marcotcr/lime • R •

    ্هͷϙʔςΟϯά • https://github.com/thomasp85/lime install.packages(‘lime’)
  31. LIME (R) • αϯϓϧ library(caret) library(lime) model <- train(iris[-5], iris[[5]],

    method = 'rf') explainer <- lime(iris[-5], model) explanations <- explain(iris[1, -5], explainer, n_labels = 1, n_features = 2) explanations model_type case label label_prob model_r2 model_intercept 1 classification 1 setosa 1 0.3776584 0.2544468 2 classification 1 setosa 1 0.3776584 0.2544468 model_prediction feature feature_value feature_weight feature_desc 1 0.7113922 Sepal.Width 3.5 0.02101138 3.3 < Sepal.Width 2 0.7113922 Petal.Length 1.4 0.43593404 Petal.Length <= 1.60 data prediction 1 5.1, 3.5, 1.4, 0.2 1, 0, 0 2 5.1, 3.5, 1.4, 0.2 1, 0, 0 ֶशثΛ܇࿅ ղऍ༻ͷΫϥεΛ࡞੒ ղऍΛग़ྗ
  32. LIME (R) model_type case label label_prob model_r2 model_intercept 1 classification

    1 setosa 1 0.3776584 0.2544468 2 classification 1 setosa 1 0.3776584 0.2544468 model_prediction feature feature_value feature_weight feature_desc 1 0.7113922 Sepal.Width 3.5 0.02101138 3.3 < Sepal.Width 2 0.7113922 Petal.Length 1.4 0.43593404 Petal.Length <= 1.60 &YQMBOBUJPO NPEFM@UZQF 5IFUZQFPGUIFNPEFMVTFEGPSQSFEJDUJPO DBTF 5IFDBTFCFJOHFYQMBJOFE UIFSPXOBNFJODBTFT  NPEFM@S 5IFRVBMJUZPGUIFNPEFMVTFEGPSUIFFYQMBOBUJPO NPEFM@JOUFSDFQU 5IFJOUFSDFQUPGUIFNPEFMVTFEGPSUIFFYQMBOBUJPO NPEFM@QSFEJDUJPO 5IFQSFEJDUJPOPGUIFPCTFSWBUJPOCBTFEPOUIFNPEFMVTFEGPSUIFFYQMBOBUJPO GFBUVSF 5IFGFBUVSFVTFEGPSUIFFYQMBOBUJPO GFBUVSF@WBMVF 5IFWBMVFPGUIFGFBUVSFVTFE GFBUVSF@XFJHIU 5IFXFJHIUPGUIFGFBUVSFJOUIFFYQMBOBUJPO GFBUVSF@EFTD "IVNBOSFBEBCMFEFTDSJQUJPOPGUIFGFBUVSFJNQPSUBODF
  33. LIME (R) plot_features(explanations) ղऍΛϓϩοτ

  34. ςΩετσʔλͷྫ • ϥϯμϜʹ୯ޠΛআ֎ͨ͠σʔλΛੜ੒ • ίαΠϯྨࣅ౓ΛExponential KernelͰॏΈ෇͚ • Ҏ߱ɺςʔϒϧσʔλͱಉ༷ LIME -

    Local Interpretable Model-Agnostic Explanations https://homes.cs.washington.edu/~marcotcr/blog/lime/ ˞1ZUIPO࣮૷ʹ΋ͱͮ͘
  35. ը૾σʔλͷྫ Ideas on interpreting machine learning https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning Figure 15. An

    illustration of the LIME process in which a weighted linear model is used to explain a single prediction from a complex neural network. Figure courtesy of Marco Tulio Ribeiro; image used with permission. ˞1ZUIPO࣮૷ʹ΋ͱͮ͘ • ϥϯμϜʹηάϝϯτԽͨ͠σʔλΛੜ੒ • skimage.segmentationͷख๏͕ར༻Մ (quickshiftͳͲ) • ίαΠϯྨࣅ౓ͰॏΈ෇͚ • Ҏ߱ɺςʔϒϧσʔλͱಉ༷ ※RͰ͸ະ࣮૷
  36. ·ͱΊ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ

    w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ
  37. Enjoy!