Slide 1

Slide 1 text

ػցֶशͱղऍՄೳੑ Masaaki Horikoshi @ ARISE analytics

Slide 2

Slide 2 text

ࣗݾ঺հ • R • ύοέʔδ։ൃͳͲ • Git Awards ࠃ಺1Ґ • Python • http://git-awards.com/users/search?login=sinhrks

Slide 3

Slide 3 text

Interpretability ղऍՄೳੑ

Slide 4

Slide 4 text

ղऍՄೳੑͱ͸ • ໌֬ͳఆٛ͸ͳ͍͕ • Ϟσϧͷग़ྗ (what) ͚ͩͰͳ͘ɺͦͷཧ༝ (why) Λઆ໌͢Δ͜ͱ • ͦͷͨΊʹɺͳΜΒ͔ͷख๏/ج४Λ༻͍ͯϞσϧΛղऍ͢Δ͜ͱ The Mythos of Model Interpretability (Lipton, 2016)

Slide 5

Slide 5 text

ͳʹ͕خ͍͔͠ • Trust • Causality • Transferability • Informativeness • Fair and Ethical Decision Making The Mythos of Model Interpretability (Lipton, 2016)

Slide 6

Slide 6 text

ղऍͷͨΊͷΞϓϩʔν 1. આ໌͠΍͍͢ػցֶशख๏ΛબͿ • ਫ਼౓͕ෆे෼ͳ৔߹͕͋Δ 2. ͳΜΒ͔ͷղऍख๏Λ࢖͏

Slide 7

Slide 7 text

ղऍՄೳੑ • Global Interpretability • Ϟσϧ΍σʔλશମͷ܏޲Λղऍ • ۙࣅ΍ཁ໿౷ܭྔΛར༻ => ہॴతʹ͸ෆਖ਼֬ͳ৔߹΋ • Local Interpretability • Ϟσϧ΍σʔλͷݶΒΕͨྖҬΛղऍ • ΑΓਖ਼֬ͳઆ໌͕Մೳ

Slide 8

Slide 8 text

ղऍՄೳੑ • ద੾ͳख๏͸ʮԿΛʯղऍ͍͔ͨ͠ʹґଘ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ

Slide 9

Slide 9 text

Regression Coefficients • ඪ४Խภճؼ܎਺ library(dplyr) library(mlbench) data(BostonHousing) df <- BostonHousing %>% mutate_if(is.factor, as.numeric) %>% mutate_at(-14, scale) head(df, 3) crim zn indus chas nox rm age 1 -0.4193669 0.2845483 -1.2866362 -0.2723291 -0.1440749 0.4132629 -0.1198948 2 -0.4169267 -0.4872402 -0.5927944 -0.2723291 -0.7395304 0.1940824 0.3668034 3 -0.4169290 -0.4872402 -0.5927944 -0.2723291 -0.7395304 1.2814456 -0.2655490 dis rad tax ptratio b lstat medv 1 0.140075 -0.9818712 -0.6659492 -1.4575580 0.4406159 -1.0744990 24.0 2 0.556609 -0.8670245 -0.9863534 -0.3027945 0.4406159 -0.4919525 21.6 3 0.556609 -0.8670245 -0.9863534 -0.3027945 0.3960351 -1.2075324 34.7

Slide 10

Slide 10 text

Regression Coefficients • ඪ४Խภճؼ܎਺ • coefplot ύοέʔδͰภճؼ܎਺ͷՄࢹԽ ͕Մೳ lm.fit <- lm(medv ~ ., data = df) coef(lm.fit) (Intercept) crim zn indus chas nox rm 22.53280632 -0.92906457 1.08263896 0.14103943 0.68241438 -2.05875361 2.67687661 age dis rad tax ptratio b lstat 0.01948534 -3.10711605 2.66485220 -2.07883689 -2.06264585 0.85010886 -3.74733185

Slide 11

Slide 11 text

• coefplot Regression Coefficients library(coefplot) coefplot(lm.fit) σʔλͷෆඋΛ౷ܭతʹݟൈ͘ (Gelman's Secret Weapon) http://d.hatena.ne.jp/hoxo_m/20150617/p1

Slide 12

Slide 12 text

• Feature Importance (Random Forest) Feature Importance library(dplyr) library(mlbench) data(Sonar) head(Sonar, 3) V1 V2 V3 V4 V5 V6 V7 V8 V9 1 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 2 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 3 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 … V55 V56 V57 V58 V59 V60 Class 1 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R 2 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R 3 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R

Slide 13

Slide 13 text

• Feature Importance (Random Forest) Feature Importance library(caret) ca.fit <- train(Class ~ ., data = Sonar, method = "rf", ntree = 100) varImp(ca.fit) rf variable importance only 20 most important variables shown (out of 60) Overall V11 100.00 V48 96.25 V45 75.45 V13 71.74 V10 60.50 …

Slide 14

Slide 14 text

• Feature Importance (Random Forest) Feature Importance plot(varImp(ca.fit), top = 10)

Slide 15

Slide 15 text

• caretͷvarImp͸ɺطఆͰख๏ґଘͷܭࢉํ๏Λར༻ • help(varImp) Feature Importance randomForest::importance(ca.fit$finalModel) %>% as.data.frame %>% tibble::rownames_to_column(var = 'variable') %>% arrange(desc(MeanDecreaseGini)) %>% head(20) varImp(ca.fit, scale = F) Overall V11 3.793 V48 3.684 V45 3.078 V13 2.970 V10 2.643 … variable MeanDecreaseGini 1 V11 3.792926 2 V48 3.683850 3 V45 3.077996 4 V13 2.970039 5 V10 2.642881 …

Slide 16

Slide 16 text

Feature Importance • randomForest::importance Ͱ͸ҎԼͷ૊Έ߹ ΘͤͰಛ௃ྔͷॏཁ౓Λܭࢉ $MBTTJpDBUJPO 3FHSFTTJPO .FBOEFDSFBTFJOBDDVSBDZDPNQVUFE GSPNQFSNVUJOH00#EBUB &SSPS3BUF .FBO%FDSFBTF"DDVSBDZ .4& *OD.4& .FBOEFDSFBTFJOOPEFJNQVSJUZ (JOJJOEFY .FBO%FDSFBTF(JOJ 344 *OD/PEF1VSJUZ

Slide 17

Slide 17 text

OOB αϯϓϦϯά (෮ݩநग़) … OOB ࢀߟ: OOBσʔλ͔Βͷಛ௃ྔ ͷॏཁ౓ͷܭࢉ ֶश ֶश OOB ༧ଌ OOB ༧ଌ … ֤ྻͷ஋Λγϟοϑϧ͠ɺOOBޡΓ཰ͱൺֱ ༧ଌ ൺֱ … ༧ଌ ൺֱ

Slide 18

Slide 18 text

• caret ͸Ϟσϧґଘ͠ͳ͍ॏཁ౓΋αϙʔτ Feature Importance (Model Agnostic) varImp(ca.fit, useModel = F) ROC curve variable importance only 20 most important variables shown (out of 60) Importance V11 100.00 V12 86.30 V10 82.64 V49 82.12 V9 81.97 …

Slide 19

Slide 19 text

• Classification • આ໌ม਺͝ͱʹROC෼ੳ • ଟΫϥεͷ৔߹͸2Ϋϥεͷ૊Έ߹Θͤ͝ͱ • Regression • આ໌ม਺͝ͱʹઢܗճؼ/LOESS Feature Importance (Model Agnostic)

Slide 20

Slide 20 text

ղऍՄೳੑ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ

Slide 21

Slide 21 text

Surrogate Models • ෳࡶͳϞσϧΛ୅ସ͢ΔγϯϓϧͳϞσϧΛ ௐ΂Δ Ideas on interpreting machine learning https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning Figure 14. An illustration of surrogate models for explaining a complex neural network. Figure courtesy of Patrick Hall and the H2O.ai team.

Slide 22

Slide 22 text

Sensitivity Analysis • ͋Δಛ௃ྔΛมԽͤͨ࣌͞ͷϞσϧͷग़ྗΛ ௐ΂Δ Sensitivity analysis for neural networks https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/

Slide 23

Slide 23 text

Partial Dependence • ͋Δಛ௃ྔΛมԽͤͨ࣌͞ͷϞσϧͷग़ྗΛ ௐ΂Δ library(pdp) library(ggplot2) pd <- partial(ca.fit, ɹɹpred.var = c("V11", “V9"), ɹchull = TRUE) autoplot(pd, contour = TRUE)

Slide 24

Slide 24 text

ղऍՄೳੑ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ w 1BSUJBM%FQFOEFODF w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ

Slide 25

Slide 25 text

LIMEͱ͸ʁ

Slide 26

Slide 26 text

Local Interpretable Model-agnostic Explanations

Slide 27

Slide 27 text

• ͋ΔσʔλͷۙลͰݩͷϞσϧΛ୅ସ͢ΔϞσϧΛ࡞੒ˠͦͷϞσϧͷ ಛ௃Λௐ΂Δ • Local Surrogate Model “Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribeiro, Singh, Guestrin 2016) LIME

Slide 28

Slide 28 text

LIME • LIME͸ҎԼͷؔ਺Λ΋ͱʹσʔλ x ͷղऍΛಘΔ • G: ղऍ༻ͷֶशثͷू߹ • L: ղऍֶ͍ͨ͠शثͱղऍ༻ͷֶशثͷ ΠxͷݩͰͷࠩ • f: ղऍֶ͍ͨ͠शث • Πx: σʔλ x ͱͷྨࣅ౓ • Ω: ղऍ༻ͷֶशثͷෳࡶ͞ʹର͢Δേଇ߲ • ۩ମతखஈ͸υϝΠϯʹґଘ

Slide 29

Slide 29 text

ςʔϒϧσʔλɾ෼ྨ໰୊ͷྫ • σʔλ x ͷपลͰαϯϓϦϯά • طఆͰ5,000 • αϯϓϦϯάํ๏͸ม਺ͷछྨʹґଘ • Exponential KernelͰॏΈ෇͚ • ม਺બ୒ • Forward/Backward, LARSͳͲ • RidgeճؼͳͲ ˞1ZUIPO࣮૷ ޙड़ ʹ΋ͱͮ͘

Slide 30

Slide 30 text

ύοέʔδ • Python • ࿦จஶऀ࡞ • https://github.com/marcotcr/lime • R • ্هͷϙʔςΟϯά • https://github.com/thomasp85/lime install.packages(‘lime’)

Slide 31

Slide 31 text

LIME (R) • αϯϓϧ library(caret) library(lime) model <- train(iris[-5], iris[[5]], method = 'rf') explainer <- lime(iris[-5], model) explanations <- explain(iris[1, -5], explainer, n_labels = 1, n_features = 2) explanations model_type case label label_prob model_r2 model_intercept 1 classification 1 setosa 1 0.3776584 0.2544468 2 classification 1 setosa 1 0.3776584 0.2544468 model_prediction feature feature_value feature_weight feature_desc 1 0.7113922 Sepal.Width 3.5 0.02101138 3.3 < Sepal.Width 2 0.7113922 Petal.Length 1.4 0.43593404 Petal.Length <= 1.60 data prediction 1 5.1, 3.5, 1.4, 0.2 1, 0, 0 2 5.1, 3.5, 1.4, 0.2 1, 0, 0 ֶशثΛ܇࿅ ղऍ༻ͷΫϥεΛ࡞੒ ղऍΛग़ྗ

Slide 32

Slide 32 text

LIME (R) model_type case label label_prob model_r2 model_intercept 1 classification 1 setosa 1 0.3776584 0.2544468 2 classification 1 setosa 1 0.3776584 0.2544468 model_prediction feature feature_value feature_weight feature_desc 1 0.7113922 Sepal.Width 3.5 0.02101138 3.3 < Sepal.Width 2 0.7113922 Petal.Length 1.4 0.43593404 Petal.Length <= 1.60 &YQMBOBUJPO NPEFM@UZQF 5IFUZQFPGUIFNPEFMVTFEGPSQSFEJDUJPO DBTF 5IFDBTFCFJOHFYQMBJOFE UIFSPXOBNFJODBTFT NPEFM@S 5IFRVBMJUZPGUIFNPEFMVTFEGPSUIFFYQMBOBUJPO NPEFM@JOUFSDFQU 5IFJOUFSDFQUPGUIFNPEFMVTFEGPSUIFFYQMBOBUJPO NPEFM@QSFEJDUJPO 5IFQSFEJDUJPOPGUIFPCTFSWBUJPOCBTFEPOUIFNPEFMVTFEGPSUIFFYQMBOBUJPO GFBUVSF 5IFGFBUVSFVTFEGPSUIFFYQMBOBUJPO GFBUVSF@WBMVF 5IFWBMVFPGUIFGFBUVSFVTFE GFBUVSF@XFJHIU 5IFXFJHIUPGUIFGFBUVSFJOUIFFYQMBOBUJPO GFBUVSF@EFTD "IVNBOSFBEBCMFEFTDSJQUJPOPGUIFGFBUVSFJNQPSUBODF

Slide 33

Slide 33 text

LIME (R) plot_features(explanations) ղऍΛϓϩοτ

Slide 34

Slide 34 text

ςΩετσʔλͷྫ • ϥϯμϜʹ୯ޠΛআ֎ͨ͠σʔλΛੜ੒ • ίαΠϯྨࣅ౓ΛExponential KernelͰॏΈ෇͚ • Ҏ߱ɺςʔϒϧσʔλͱಉ༷ LIME - Local Interpretable Model-Agnostic Explanations https://homes.cs.washington.edu/~marcotcr/blog/lime/ ˞1ZUIPO࣮૷ʹ΋ͱͮ͘

Slide 35

Slide 35 text

ը૾σʔλͷྫ Ideas on interpreting machine learning https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning Figure 15. An illustration of the LIME process in which a weighted linear model is used to explain a single prediction from a complex neural network. Figure courtesy of Marco Tulio Ribeiro; image used with permission. ˞1ZUIPO࣮૷ʹ΋ͱͮ͘ • ϥϯμϜʹηάϝϯτԽͨ͠σʔλΛੜ੒ • skimage.segmentationͷख๏͕ར༻Մ (quickshiftͳͲ) • ίαΠϯྨࣅ౓ͰॏΈ෇͚ • Ҏ߱ɺςʔϒϧσʔλͱಉ༷ ※RͰ͸ະ࣮૷

Slide 36

Slide 36 text

·ͱΊ .PEFM4QFDJpD .PEFM"HOPTUJD (MPCBM *OUFSQSFUBCJMJUZ w 3FHSFTTJPO$PF⒏DJFOUT w 'FBUVSF*NQPSUBODF ʜ w 4VSSPHBUF.PEFMT w 4FOTJUJWJUZ"OBMZTJT ʜ -PDBM *OUFSQSFUBCJMJUZ w .BYJNVN"DUJWBUJPO"OBMZTJT ʜ w -*.& w -0$0 w 4)"1 ʜ

Slide 37

Slide 37 text

Enjoy!