Slide 1

Slide 1 text

ӝੜਅ໵ʢಙౡେֶσβΠϯܕ"*ڭҭݚڀηϯλʔʣ ɹϢʔβͷͨΊͷ ػցֶशνϡʔτϦΞϧ ୈҰ෦ɿ3ͷػցֶशϑϨʔϜϫʔΫͷ঺հʙUJEZNPEFMTΛத৺ʹʙ ೔ຊ౷ܭֶձୈճय़قूձاըηογϣϯ".#  https://github.com/uribo/240309_jss18tutorial  𝕏 u_ribo

Slide 2

Slide 2 text

ࣗݾ঺հ ಙౡେֶσβΠϯܕ"*ڭҭݚڀηϯλʔɾॿڭ ӝੜਅ໵ ͏ΓΎ͏͠Μ΍ ؔ৺ɾݚڀ σʔλՄࢹԽɺۭؒӸֶɺਓྲྀɺ໺ੜੜ෺ ʮ"*૯߹ԋशʯɺʮ৘ใՊֶೖ໳ʯɺ ʮσʔλαΠΤϯε΁ͷ༠͍ʯ ୲౰ߨٛ KQNFTI [JQBOHV KNBTUBUT GHESͳͲ ύοέʔδ ։ൃ ࣥචɾ຋༁ ѪݘʮϥοΩʔʯͱࢲ

Slide 3

Slide 3 text

νϡʔτϦΞϧͷ಺༰ ୈҰ෦ʢ෼ʣ ୈೋ෦ʢ෼ʣ ػցֶशͷϫʔΫϑϩʔʹ͍ͭͯ ػցֶशϑϨʔϜϫʔΫͷಋೖ 3ʹΑΔػցֶशͷ࣮ԋ ٳܜʢ෼ʣ ۭؒґଘੑ΁ͷ഑ྀ Ϟσϧͷۭؒతͳద༻ൣғ ஍ཧۭؒσʔλͷಛ௃ ஍ཧۭؒσʔλʷػցֶशϞσϧ ஍ཧۭؒσʔλΛػցֶशʹ ద༻͢Δࡍͷ՝୊

Slide 4

Slide 4 text

໨ඪ ɽ3ݴޠͰ͞·͟·ͳػցֶशख๏ͷద༻ɾϞσϧߏஙલޙͷૢ࡞Λߦ͏ ύοέʔδ͕ଘࡏ͢Δ͜ͱΛ஌Δ ͜ΕΒͷύοέʔδΛ౷ҰతͳΠϯλʔϑΣΠεͰ·ͱΊͨϑϨʔϜϫʔΫͱɺ ɹػցֶशϞσϧͷ޻ఔͷͭͳ͕ΓΛֶͿ ஍ཧۭؒσʔλΛػցֶशϞσϧͰѻ͏ࡍͷ஫ҙ఺Λཧղ͠ɺ ɹͲͷΑ͏ͳํ๏Ͱରࡦ͕Մೳ͔Λݕ౼Ͱ͖ΔΑ͏ʹͳΔ νϡʔτϦΞϧऴྃޙɺUJEZNPEFMTΛ༻͍ͯɺ ɹࣗ਎ͷσʔλ΍ద༻͢ΔϞσϧʹԠͨ͡ػցֶशϞσϧΛߏஙํ๏Λ਎ʹ͚ͭΔ

Slide 5

Slide 5 text

ѻΘͳ͍಺༰ ౷ܭతख๏ɺϞσϧͷ਺ཧతഎܠʹ͍ͭͯ͸লུˠࢀՃऀͷํʑͷ΄͏͕ৄ͍͠ 3ݴޠͷૢ࡞ˠύΠϓԋࢉࢠʢcʣΞϨϧΪʔ͕ͳ͍͜ͱ ඇߏ଄ԽσʔλΛ༻͍ͨػցֶशɺਂ૚ֶशˠදܗࣜͷσʔλΛ૝ఆ νϡʔτϦΞϧͰͷର৅ൣғ ਓ޻஌ೳɺػցֶशɺਂ૚ֶशͷҐஔ͚ͮ ਓ޻஌ೳ ػցֶश ΤΩεύʔτ γεςϜͳͲ ਂ૚ֶश ݹ͖ྑ͖"*   

Slide 6

Slide 6 text

ࢿྉஔ͖৔ ຊνϡʔτϦΞϧͰ༻͍Διʔείʔυ΍εϥΠυΛԼه63-Ͱެ։͍ͯ͠·͢   νϡʔτϦΞϧʺϫʔΫγϣοϓ 3ίʔυͷ࣮ߦ͸֤ࣗͰߦ͍ͬͯͩ͘͞ ࠶ݱՄೳͳ؀ڥΛఏڙ͠·͕͢ɺ׬શͰ͸ͳ͍͔΋͠Ε·ͤΜ ޙ೔*TTVFT౳Ͱ࿈བྷ͍͚ͨͩͨΒରԠ͠·͢ https://github.com/uribo/240309_jss18tutorial

Slide 7

Slide 7 text

ࢿྉஔ͖৔  ᶅ@KTTUVUPSJBM3QSPK ɹɹɹɹɹɹɹΛ34UVEJPͰ։͘

Slide 8

Slide 8 text

ػցֶशͷ ϫʔΫϑϩʔ

Slide 9

Slide 9 text

σʔλ෼ੳશൠͷϫʔΫϑϩʔ (BSSFUUBOE)BEMFZ Λ΋ͱʹ࡞੒

Slide 10

Slide 10 text

σʔλ෼ੳͷϫʔΫϑϩʔ (BSSFUUBOE)BEMFZ Λ΋ͱʹ࡞੒ ͜ͷνϡʔτϦΞϧͰ͸࿮ઢ಺ʹয఺Λ౰ͯ·͢

Slide 11

Slide 11 text

ػցֶशϞσϦϯάͷϫʔΫϑϩʔ .BY,VIO BOE,KFMM+PIOTPO Λ΋ͱʹ࡞੒ ෳ਺ͷ޻ఔΛ൓෮తʹߦ͏ͷ͕ಛ௃ ֤޻ఔͰҙࢥܾఆ͕ٻΊΒΕΔ ྫʣͲͷΑ͏ʹಛ௃ྔΤϯδχΞϦϯάΛߦ͏ʁϞσϧ͸ԿΛར༻͢Δʁ

Slide 12

Slide 12 text

ػցֶश ϑϨʔϜϫʔΫ

Slide 13

Slide 13 text

ϑϨʔϜϫʔΫΛར༻͢Δ΍൱΍ ౷Ұ͞Ε͍ͯͳ͍ΠϯλʔϑΣΠεͷֶशίετ͕ߴ͍ʜ ར༻͠ͳ͍ɾͰ͖ͳ͍ ֤޻ఔͷ݁ՌΛγʔϜϨεʹҠߦͰ͖Δ͔͸Θ͔Βͳ͍ ར༻͢Δ ΠϯλʔϑΣΠεͷֶशίετ͕Լ͕Δ   ݪଇతʹɺҰͭͷ޻ఔͰͭͷύοέʔδ ಉҰͷ޻ఔ಺Ͱ͋ͬͯ΋ҟͳΔύοέʔδΛ༻͍Δ͜ͱʹͳΔ   ॊೈͳॲཧ͕Մೳ  ޻ఔؒͰੜͨ͡ΦϒδΣΫτʢ݁Ռʣͷ΍ΓऔΓɺ࿈ܞ͕༰қ ύοέʔδʹΑΓఏڙ͞ΕΔϞσϧ͕ରԠ͍ͯ͠Δͱ͸ݶΒͳ͍    ʢࢀߟʣ1ZUIPOʹ͓͚Δ4DJLJU-FBSO

Slide 14

Slide 14 text

3Ͱͷ.-ϑϨʔϜϫʔΫᶃUJEZNPEFMT ಛ௃ 1PTJUϝϯόʔΛؚΊͨUJEZNPEFMTνʔϜ͕։ൃʢʙʣ https://www.tidymodels.org/ UJEZWFSTFͷ఩ֶɺઃܭࢥ૝Λ౿ऻˠύΠϓԋࢉࢠϑϨϯυϦʔ DBSFUύοέʔδͷ։ൃऀ.BY,VIO͕Ϧʔυ UJEZNPEFMTύοέʔδͱͯ͠·ͱΊͯఏڙ ύοέʔδΠϯετʔϧ࣌ʹґଘύοέʔδ΋Πϯετʔϧ ϦαϯϓϦϯάɺֶशثɺಛ௃ྔΤϯδχΞϦϯάͳͲͷػೳΛ֦ு͠΍͍͢

Slide 15

Slide 15 text

3Ͱͷ.-ϑϨʔϜϫʔΫᶃUJEZNPEFMT ಛ௃ 1PTJUϝϯόʔΛؚΊͨUJEZNPEFMTνʔϜ͕։ൃʢʙʣ https://www.tidymodels.org/ UJEZWFSTFͷ఩ֶɺઃܭࢥ૝Λ౿ऻˠύΠϓԋࢉࢠϑϨϯυϦʔ DBSFUύοέʔδͷ։ൃऀ.BY,VIO͕Ϧʔυ UJEZNPEFMTύοέʔδͱͯ͠·ͱΊͯఏڙ ύοέʔδΠϯετʔϧ࣌ʹґଘύοέʔδ΋Πϯετʔϧ ϦαϯϓϦϯάɺֶशثɺಛ௃ྔΤϯδχΞϦϯάͷͨΊͷػೳΛ֦ு͠΍͍͢ QBSTOJQͷ֦ு \DFOTPSFE^ʜੜଘ࣌ؒղੳ \QPJTTPOSFH^ʜϙΞιϯճؼ \UJEZDMVTU^ΫϥελϦϯά \CPOTBJ^ʜ໦Ϟσϧ SFDJQFTͷ֦ு \FNCFE^ʜຒΊࠐΈදݱ \UFYUSFDJQFT^ʜจࣈྻૢ࡞ \UIFNJT^ʜෆۉߧσʔλ \VSJCPXBTIPLV^ʜ೔ຊޠจষ ϞσϧͷσϓϩΠ ˠ\CFUJWFS^

Slide 16

Slide 16 text

3Ͱͷ.-ϑϨʔϜϫʔΫᶃUJEZNPEFMT \SFDJQFT^ \STBNQMF^ σʔλલॲཧɺ ಛ௃ྔΤϯδχΞϦϯά σʔλ෼ׂɺ ϦαϯϓϦϯά \QBSTOJQ^ Ϟσϧߏஙɾద༻ \ZBSETUJDL^ \XPSL fl PXT^ ϞσϧͷੑೳධՁ ֤޻ఔͷϫʔΫϑϩʔԽɺ ޮ཰తͳॲཧ \UVOF^ ύϥϝʔλ୳ࡧ ओͳύοέʔδͱͦͷ༻్

Slide 17

Slide 17 text

3Ͱͷ.-ϑϨʔϜϫʔΫᶄNMS https://mlr-org.com/ NMSύοέʔδΛத৺ͱͯ͠ɺػցֶशͷ޻ఔʹඞཁͳύοέʔδ܈Λ NMSWFSTFύοέʔδͱͯ͠·ͱΊͯఏڙ UBTL MFBSOFS SFTBNQMFS UVOFS QFSGPSNBODF υΠπΛத৺ʹ׆ಈ͢ΔNMSPSH͕։ൃ 3ΫϥεͰ࣮૷ ಛ௃ σʔλΛEBUBUBCMFͱͯ͠ѻ͏ NMSͷ֦ுɻ஍ཧۭؒσʔλΛѻ͏ͨΊͷύοέʔδ΋ؚ·ΕΔ ஍ཧۭؒσʔλؔ܎ͷ ॲཧʹରͯ͠ɺUJEZNPEFMTΑΓ΋ ஫ྗ͍ͯ͠Δʁʢӝੜࡶײʣ

Slide 18

Slide 18 text

3Ͱͷ.-ϑϨʔϜϫʔΫᶅ%"-&9 આ໌ՄೳͳػցֶशϞσϧͷͨΊʹઃܭ͞Εͨ ύοέʔδʢ୯ମʣ͓Αͼύοέʔδ܈Λࢦ͢ ϫϧγϟϫ޻Պେֶʢϙʔϥϯυʣ಺.*"*ʹΑΔ։ൃ https://dalex.drwhy.ai/ UJEZNPEFMTɺNMSͰߏங͞Εͨ ϞσϧΛѻ͏֦ுͱͯ͠ %"-&9FYUSBύοέʔδ͕ఏڙ͞ΕΔ 1ZUIPO൛΋੔උ͞Ε͍ͯΔ ಛ௃ 

Slide 19

Slide 19 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ

Slide 20

Slide 20 text

3ʹΑΔ ػցֶशͷ࣮ԋ

Slide 21

Slide 21 text

Ψεڙڅঢ়گͷೋ஋෼ྨϞσϧͷߏங σʔλʲࠃ౔਺஋৘ใʳ౎ಓ෎ݝ஍Ձௐࠪσʔλʢ೥ʣ https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-L02-2020.html ࠃ౔ར༻ܭը๏ࢪߦྩʹج͖ͮɺ֤౎ಓ෎ݝ ஌ࣄʹΑͬͯߦΘΕ֤ͨ೥݄೔࣌఺ͷ஍Ձ ௐࠪ݁Ռʹ͍ͭͯɺͦͷج४஍ͷҐஔʢ఺ʣɺ ௐࠪՁ֨ɺར༻ݱگɺ༻్஍Ҭɺ஍ੵ౳Λ੔උ ͨ͠΋ͷ ˠؔ౦஍ํʢ౎ݝʣͷ σʔλΛՃ޻ͯ͠࢖͍·͢ σʔλߏ଄ MQ MBOEQSJDF TVQQMZ

Slide 22

Slide 22 text

Ψεڙڅঢ়گͷೋ஋෼ྨϞσϧͷߏங ஍ཧۭؒଐੑΛআ͍͍ͯ·͢ɻ͜ΕΒ͸ୈೋ෦Ͱѻ͍·͢ dplyr::glimpse(lp_supply) #> Rows: 1,812 #> Columns: 8 #> $ price 115000, 61400, 57200, 51600, 177000, 208000, 57600, 68600… #> $ water TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU… #> $ gas TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TR… #> $ sewer TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU… #> $ above_floor 8, 0, 3, 3, 5, 12, 3, 3, 3, 2, 6, 2, 7, 2, 2, 3, 2, 3, 3,… #> $ under_floor 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, … #> $ dist_from_st 1300, 900, 270, 2100, 150, 0, 150, 850, 1200, 400, 640, 1… #> $ fire "๷Ր", "४๷", "४๷", "४๷", "๷Ր", "๷Ր", "४๷", "…

Slide 23

Slide 23 text

Ψεڙڅঢ়گͷೋ஋෼ྨϞσϧͷߏங dplyr::glimpse(lp_supply) #> Rows: 1,812 #> Columns: 8 #> $ price #> $ water #> $ gas #> $ sewer #> $ above_floor #> $ under_floor #> $ dist_from_st #> $ fire 水水 火

Slide 24

Slide 24 text

୳ࡧతσʔλ෼ੳ σʔλͷಛ௃ͷ೺ѲɺཧղΛਂΊΔ Ϟσϧͷߏ૝Λ࿅Δ

Slide 25

Slide 25 text

library(tidymodels) #> ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ── #> ✔ broom 1.0.5 ✔ recipes 1.0.10 #> ✔ dials 1.2.1 ✔ rsample 1.2.0 #> ✔ dplyr 1.1.4 ✔ tibble 3.2.1 #> ✔ ggplot2 3.5.0 ✔ tidyr 1.3.1 #> ✔ infer 1.0.6 ✔ tune 1.1.2 #> ✔ modeldata 1.3.0 ✔ workflows 1.1.4 #> ✔ parsnip 1.2.0 ✔ workflowsets 1.0.1 #> ✔ purrr 1.0.2 ✔ yardstick 1.3.0 #> #> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> • Dig deeper into tidy modeling with R at https://www.tmwr.org UJEZNPEFMTύοέʔδͷಡΈࠐΈ

Slide 26

Slide 26 text

library(tidymodels) #> ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ── #> ✔ broom 1.0.5 ✔ recipes 1.0.10 #> ✔ dials 1.2.1 ✔ rsample 1.2.0 #> ✔ dplyr 1.1.4 ✔ tibble 3.2.1 #> ✔ ggplot2 3.5.0 ✔ tidyr 1.3.1 #> ✔ infer 1.0.6 ✔ tune 1.1.2 #> ✔ modeldata 1.3.0 ✔ workflows 1.1.4 #> ✔ parsnip 1.2.0 ✔ workflowsets 1.0.1 #> ✔ purrr 1.0.2 ✔ yardstick 1.3.0 #> #> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> • Dig deeper into tidy modeling with R at https://www.tmwr.org tidymodels package::function UJEZNPEFMTύοέʔδͷಡΈࠐΈ

Slide 27

Slide 27 text

ؔ਺໊ͷিಥ ύοέʔδؒͰಉؔ͡਺໊͕࢖ΘΕ͍ͯΔͱɺʮিಥʯ͕ൃੜ͢Δ ͋ͱ͔ΒಡΈࠐ·Εͨؔ਺໊͕༏ઌ͞ΕΔ ྫʣEQMZSMBH ͱTUBUTMBH ಉؔ͡਺໊Ͱ͋ͬͯ΋ػೳ͸ύοέʔδؒͰҟͳΔͨΊɺ ҙਤ͠ͳ͍݁ՌΛ΋ͨΒ͢ɺૢ࡞͕ߦΘΕͳ͍͓ͦΕ͕͋Δ lag(ldeaths, 1) #> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #> 1973 3035 #> 1974 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512 2933 #> 1975 2889 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837 2787 #> 1976 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823 3102 #> 1977 2294 2385 2444 1748 1554 1498 1361 1346 1564 1640 2293 2815 #> 1978 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491 3084 #> 1979 2605 2573 2143 1693 1504 1461 1354 1333 1492 1781 1915 library(dplyr) lag(ldeaths, 1) #> Error in `lag()`: #> ! `x` must be a vector, not a , do you want `stats::lag()`?  

Slide 28

Slide 28 text

  library(tidymodels) tidymodels_prefer() library(conflicted) # ؔ਺ͷিಥ͕ൃੜ͍ͯ͠Δͱɺ ɹɹɹɹɹɹɹɹɹɹɹɹ# ༏ઌॱҐΛࢦఆ͢ΔΑ͏ʹଅ͢ϝοηʔδΛग़ྗͯ͘͠ΕΔ conflict_scout() # ݱࡏͷিಥঢ়گΛ֬ೝʢtidymodels಺Ͱͷڝ߹ʣ #> 4 conflicts #> • `discard()`: purrr and scales #> • `filter()`: dplyr #> • `lag()`: dplyr and stats #> • `step()`: recipes UJEZNPEFMT@QSFGFS Λ࣮ߦ͢ΔͱɺUJEZNPEFMT͕ґଘ͢Δ ύοέʔδͷؔ਺͕༏ઌతʹར༻͞ΕΔΑ͏ʹ͢Δ ؔ਺໊ͷিಥΛճආ͢Δʹ͸ ໊લۭؒΛࢦఆͯؔ͠਺Λར༻͢ΔˠQBDLBHFGVODJUPO ͷܗࣜ DPO fl JDUFEύοέʔδͰ໌ࣔతʹ༏ઌ͢Δؔ਺Λࢦఆ͢Δ 行

Slide 29

Slide 29 text

ֶशσʔλ ςετσʔλ set.seed(123) lp_split <- initial_split(lp_supply, prop = 0.8, strata = gas) lp_split #> #> <1449/363/1812> lp_train <- training(lp_split) lp_test <- testing(lp_split) \STBNQMF^σʔλ෼ׂ

Slide 30

Slide 30 text

ֶशσʔλ ςετσʔλ set.seed(123) lp_split <- initial_split(lp_supply, prop = 0.8, strata = gas) lp_split #> #> <1449/363/1812> lp_train <- training(lp_split) lp_test <- testing(lp_split) 比 \STBNQMF^σʔλ෼ׂ

Slide 31

Slide 31 text

\QBSTOJQ^ֶशثͷߏங Ϟσϧͷબ୒ Τϯδϯʢ˺ύοέʔδʣͷࢦఆ ϞʔυʢλεΫʣͷࢦఆ

Slide 32

Slide 32 text

\QBSTOJQ^ֶशثͷߏங Ϟσϧͷબ୒ ˠܾఆ໦ Τϯδϯͷࢦఆ Ϟʔυͷࢦఆ logistic_reg() #> Logistic Regression Model Specification (classification) #> #> Computational engine: glm 自 classi fi cation linear_reg() #> Linear Regression Model Specification (regression) decision_tree() #> Decision Tree Model Specification (unknown mode) #> #> Computational engine: rpart 示

Slide 33

Slide 33 text

\QBSTOJQ^ֶशثͷߏங Ϟσϧͷબ୒ ˠܾఆ໦ Τϯδϯͷࢦఆ ˠSQBSUύοέʔδ Ϟʔυͷࢦఆ 1 文 用 show_engines("decision_tree") #> # A tibble: 5 × 2 #> engine mode #> #> 1 rpart classification #> 2 rpart regression #> 3 C5.0 classification #> 4 spark classification #> 5 spark regression set_engine("rpart") #> Decision Tree Model Specification (unknown mode) #> #> Computational engine: rpart decision_tree() |>

Slide 34

Slide 34 text

tree_spec <- set_mode("classification") tree_spec #> Decision Tree Model Specification (classification) #> #> Main Arguments: #> cost_complexity = 0.002 #> #> Computational engine: rpart \QBSTOJQ^ֶशثͷߏங Ϟσϧͷબ୒ ˠܾఆ໦ Τϯδϯͷࢦఆ ˠSQBSUύοέʔδ Ϟʔυͷࢦఆ ˠ෼ྨ໰୊ decision_tree(cost_complexity = 0.002) |> set_engine("rpart") |> classi fi cation regression

Slide 35

Slide 35 text

\QBSTOJQ^ֶशثͷߏங ranger::ranger(x, y, mtry = min_cols(~10, x), num.trees = 2000, ...) ྫʣϥϯμϜϑΥϨετͰͷܾఆ໦࡞੒࣌ͷಛ௃ྔ਺ NUSZ ɺܾఆ໦ͷ਺ OUSFF Λ ɹɹἧ͑ɺҟͳΔύοέʔδͰֶशثΛ࣮૷͢Δ randomForest::randomForest(x, y, mtry = min_cols(~10, x), ntree = 2000) sparklyr::ml_random_forest(x, formula, type = "regression", feature_subset_strategy = "10", num_trees = 2000, ...) ύοέʔδؒͰύϥϝʔλͷࢦఆํ๏͕ҟͳΔʜ

Slide 36

Slide 36 text

rand_forest(mtry = 10, trees = 2000) |> set_engine("ranger") |> set_mode("regression") |> translate() #> Random Forest Model Specification (regression) #> #> Main Arguments: #> mtry = 10 #> trees = 2000 #> #> Computational engine: ranger #> #> Model fit template: #> ranger::ranger(x = missing_arg(), y = missing_arg(), weights = missing_arg(), #> mtry = min_cols(~10, x), num.trees = 2000, num.threads = 1, #> verbose = FALSE, seed = sample.int(10^5, 1)) \QBSTOJQ^ֶशثͷߏங ΤϯδϯʹԠͯ͡ҟͳΔύϥϝʔλͷࢦఆํ๏ΛϞσϧؔ਺಺Ͱ౷Ұ ΤϯδϯΛม͑Δ͚ͩͰύϥϝʔλͷࢦఆํ๏ʹؾΛݣΘͳ͍Ͱ͢Ή

Slide 37

Slide 37 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ UJEZNPEFMTύοέʔδΛར༻͢Δ͜ͱͰɺػցֶशϞσϦϯάͰߦ͏޻ఔΛ ౷ҰతΠϯλʔϑΣΠεͷ΋ͱͰਐΊΒΕΔ σʔλ෼ׂˠ\STBNQMF^ ֶशثͷߏஙˠ\QBSTOJQ^

Slide 38

Slide 38 text

\XPSL fl PXT^ϫʔΫϑϩʔԽ ͜͜·Ͱͷ಺༰Λ·ͱΊΔͱʜ tree_wflow <- workflow(gas ~ ., spec = tree_spec) tree_wflow <- workflow() |> add_formula(gas ~ .) |> add_model(tree_spec) ͋Δ͍͸

Slide 39

Slide 39 text

tree_wflow <- workflow(gas ~ ., spec = tree_spec) tree_wflow <- workflow() |> add_formula(gas ~ .) |> add_model(tree_spec) ͋Δ͍͸ \XPSL fl PXT^ϫʔΫϑϩʔԽ 目 tree_fit <- tree_wflow |> fit(data = lp_train) predict(tree_fit, new_data = lp_test) ςετσʔλ Λ༻ֶ͍ͨश ֶशσʔλ ʹର͢Δ༧ଌ ϫʔΫϑϩʔԽʹΑΓֶशثͷมߋ͕༰қͱͳΔ lp_tree_augment <- augment(tree_fit, new_data = lp_train) ΁ͷ༧ଌ݁Ռͷద༻ ֶशσʔλ

Slide 40

Slide 40 text

tree_fit #> ══ Workflow [trained] ══════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: decision_tree() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────── #> gas ~ . #> #> ── Model ─────────────────────────────────────────────────────────────────────── #> n= 1449 #> #> node), split, n, loss, yval, (yprob) #> * denotes terminal node #> #> 1) root 1449 122 TRUE (0.08419600 0.91580400) #> 2) price< 124500 181 78 TRUE (0.43093923 0.56906077) #> 4) dist_from_st>=5550 10 0 FALSE (1.00000000 0.00000000) * #> 5) dist_from_st< 5550 171 68 TRUE (0.39766082 0.60233918) #> … ʢলུʣ #> 3) price>=124500 1268 44 TRUE (0.03470032 0.96529968) * \XPSL fl PXT^ϫʔΫϑϩʔԽ Λ༻ֶ͍ͨश ֶशσʔλ

Slide 41

Slide 41 text

\XPSL fl PXT^ϫʔΫϑϩʔԽ lp_tree_augment |> # ֶशσʔλʹϞσϧ༧ଌ݁Ռ͕ྻͱͯ͠༩͑ΒΕ͍ͯΔ # ֬ೝͷͨΊʹҰ෦ͷྻ͚ͩΛදࣔ select(starts_with(".pred_"), gas) #> # A tibble: 1,449 × 4 #> .pred_class .pred_FALSE .pred_TRUE gas #> #> 1 TRUE 0 1 TRUE #> 2 TRUE 0.0833 0.917 TRUE #> 3 TRUE 0.25 0.75 TRUE #> 4 TRUE 0.0347 0.965 TRUE #> 5 TRUE 0.0347 0.965 TRUE #> 6 TRUE 0.0833 0.917 TRUE #> 7 TRUE 0.0833 0.917 TRUE #> 8 FALSE 0.889 0.111 TRUE #> 9 TRUE 0 1 TRUE #> 10 TRUE 0.0833 0.917 TRUE #> # ℹ 1,439 more rows ΁ͷ༧ଌ݁Ռͷద༻ ֶशσʔλ

Slide 42

Slide 42 text

#> # A tibble: 1,449 × 4 #> .pred_class .pred_FALSE .pred_TRUE gas #> #> 1 TRUE 0 1 TRUE #> 2 TRUE 0.0833 0.917 TRUE #> 3 TRUE 0.25 0.75 TRUE #> 4 TRUE 0.0347 0.965 TRUE #> 5 TRUE 0.0347 0.965 TRUE #> 6 TRUE 0.0833 0.917 TRUE #> 7 TRUE 0.0833 0.917 TRUE #> 8 FALSE 0.889 0.111 TRUE #> 9 TRUE 0 1 TRUE #> 10 TRUE 0.0833 0.917 TRUE #> # ℹ 1,439 more rows \XPSL fl PXT^ϫʔΫϑϩʔԽ lp_tree_augment |> # ֶशσʔλʹϞσϧ༧ଌ݁Ռ͕ྻͱͯ͠༩͑ΒΕ͍ͯΔ # ֬ೝͷͨΊʹҰ෦ͷྻ͚ͩΛදࣔ select(starts_with(".pred_"), gas) ΁ͷ༧ଌ݁Ռͷద༻ ֶशσʔλ

Slide 43

Slide 43 text

ߏஙͨ͠Ϟσϧͷղऍʙܾఆ໦ͷදࣔʙ tree_fit |> extract_fit_engine() |> rpart.plot::rpart.plot(roundint = FALSE)

Slide 44

Slide 44 text

\WJQ^ \%"-&9^ߏஙͨ͠Ϟσϧͷղऍ tree_exp <- DALEXtra::explain_tidymodels(tree_fit, data = lp_test, y = as.numeric(lp_test$gas)-1, label = "lp_tree") ·ͨ͸ DALEX::explain(tree_fit, data = lp_test, y = as.numeric(lp_test$gas)-1, label = "lp_tree") \%"-&9^Ͱ͸UJEZNPEFMTͰ࡞੒ͨ͠ϞσϧΛѻ͑Δ

Slide 45

Slide 45 text

ߏஙͨ͠Ϟσϧͷղऍʙ(MPCBMʙ tree_fit |> extract_fit_engine() |> vip::vip() tree_effect <- model_parts(tree_exp, type = "variable_importance") plot(tree_effect, show_boxplots = TRUE) ม਺ॏཁ౓

Slide 46

Slide 46 text

ߏஙͨ͠Ϟσϧͷղऍʙ-PDBMʙ ୯Ұͷ؍ଌʹରͯ͠Ϟσϧ͕ͲͷΑ͏ʹಇ͔͘Λ֬ೝ predict(tree_fit, new_data = lp_test[1, ]) #> # A tibble: 1 × 1 #> .pred_class #> #> 1 FALSE predict(tree_exp, lp_test[1, ]) #> TRUE #>0.4736842 predict_parts( tree_exp, new_observation = lp_test[1, ]) |> plot()

Slide 47

Slide 47 text

\ZBSETUJDL^ϞσϧͷੑೳධՁ autoplot("heatmap") lp_tree_augment |> conf_mat(truth = gas, ɹɹɹɹɹɹ estimate = .pred_class) |> lp_tree_augment |> conf_mat(truth = gas, estimate = .pred_class) #> Truth #> Prediction FALSE TRUE #> FALSE 69 32 #> TRUE 53 1295 ֶशσʔλ ͱϞσϧͷ༧ଌ஋ʹର͢Δࠞಉߦྻ

Slide 48

Slide 48 text

\ZBSETUJDL^ϞσϧͷੑೳධՁ lp_tree_augment |> accuracy(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.941 lp_tree_augment |> sensitivity(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 sensitivity binary 0.566 lp_tree_augment |> specificity(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 specificity binary 0.976 ֶशσʔλ ͱϞσϧͷ༧ଌ஋ʹର͢Δ͞·͟·ͳධՁࢦඪͷࢉग़

Slide 49

Slide 49 text

lp_tree_augment |> accuracy(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.941 lp_tree_augment |> sensitivity(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 sensitivity binary 0.566 lp_tree_augment |> specificity(truth = gas, estimate = .pred_class) #> # A tibble: 1 × 3 #> .metric .estimator .estimate #> #> 1 specificity binary 0.976 \ZBSETUJDL^ϞσϧͷੑೳධՁ 用 accuracy sensitivity specificity

Slide 50

Slide 50 text

lp_tree_augment |> accuracy(truth = gas, estimate = .pred_class) lp_tree_augment |> sensitivity(truth = gas, estimate = .pred_class) lp_tree_augment |> specificity(truth = gas, estimate = .pred_class) \ZBSETUJDL^ϞσϧͷੑೳධՁ 用 lp_metrics <- metric_set(accuracy, sensitivity, specificity) lp_tree_augment |> lp_metrics(truth = gas, estimate = .pred_class) #> # A tibble: 3 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.941 #> 2 sensitivity binary 0.566 #> 3 specificity binary 0.976

Slide 51

Slide 51 text

\ZBSETUJDL^ϞσϧͷੑೳධՁ ͱϞσϧͷ༧ଌ஋ʹର͢Δ͞·͟·ͳධՁࢦඪͷࢉग़ ςετσʔλ ͓͞Β͍ʜϞσϧͷ༧ଌ݁Ռʢϥϕϧɺϥϕϧʹର͢Δ֬཰ʣΛ σʔλϑϨʔϜͷσʔλͷྻͱͯ͠௥Ճ͢Δؔ਺͸ʁ

Slide 52

Slide 52 text

\ZBSETUJDL^ϞσϧͷੑೳධՁ ͱϞσϧͷ༧ଌ஋ʹର͢Δ͞·͟·ͳධՁࢦඪͷࢉग़ ςετσʔλ ͓͞Β͍ʜϞσϧͷ༧ଌ݁Ռʢϥϕϧɺϥϕϧʹର͢Δ֬཰ʣΛ σʔλϑϨʔϜͷσʔλͷྻͱͯ͠௥Ճ͢Δؔ਺͸ʁ tree_fit |> augment(new_data = lp_test) |> lp_metrics(truth = gas, estimate = .pred_class) #> # A tibble: 3 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.923 #> 2 sensitivity binary 0.417 #> 3 specificity binary 0.959

Slide 53

Slide 53 text

\ZBSETUJDL^ϞσϧͷੑೳධՁ ςετσʔλ tree_fit |> augment(new_data = lp_test) |> lp_metrics(truth = gas, estimate = .pred_class) #> # A tibble: 3 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.923 #> 2 sensitivity binary 0.417 #> 3 specificity binary 0.959 lp_tree_augment |> lp_metrics(truth = gas, estimate = .pred_class) #> # A tibble: 3 × 3 #> .metric .estimator .estimate #> #> 1 accuracy binary 0.941 #> 2 sensitivity binary 0.566 #> 3 specificity binary 0.976 ֶशσʔλ

Slide 54

Slide 54 text

\%"-&9^ʹΑΔϞσϧͷੑೳධՁ ςετσʔλ ʹର͢ΔධՁࢦඪͷࢉग़ DALEX::model_performance(tree_exp) #> Measures for: classification #> recall : 0.9587021 #> precision : 0.9587021 #> f1 : 0.9587021 #> accuracy : 0.922865 #> auc : 0.8038963 #> #> Residuals: #> 0% 10% 20% 30% 40% 50% #> -0.96529968 0.03470032 0.03470032 0.03470032 0.03470032 0.03470032 #> 60% 70% 80% 90% 100% #> 0.03470032 0.03470032 0.03470032 0.03470032 0.88888889

Slide 55

Slide 55 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ UJEZNPEFMTύοέʔδΛར༻͢Δ͜ͱͰɺػցֶशϞσϦϯάͰߦ͏޻ఔΛ ౷ҰతΠϯλʔϑΣΠεͷ΋ͱͰਐΊΒΕΔ σʔλ෼ׂˠ\STBNQMF^ ֶशثͷߏஙˠ\QBSTOJQ^ ϫʔΫϑϩʔԽˠ\XPSL fl PXT^ ੑೳධՁˠ\ZBSETUJDL^

Slide 56

Slide 56 text

ֶशσʔλ ςετσʔλ \STBNQMF^ަࠩݕূ๏ͷಋೖ ෼ੳσʔλ ධՁσʔλ ֶशσʔλΛ΋ͱʹɺ෼ੳσʔλͱධՁσʔλ͔ΒͳΔϦαϯϓϧσʔλΛಘΔ Ϧαϯϓϧ ʜ Ϧαϯϓϧ Ϧαϯϓϧ/ \STBNQMF^Ͱ͸͞·͟·ͳϦαϯϓϦϯάख๏Λఏڙ͢Δ

Slide 57

Slide 57 text

\STBNQMF^ަࠩݕূ๏ͷಋೖ # 2ճͷ܁Γฦ͠ɺ10෼ׂަࠩݕূ lp_folds <- vfold_cv(lp_train, v = 10, repeats = 2, strata = gas) lp_folds #> # 10-fold cross-validation repeated 2 times using stratification #> # A tibble: 20 × 3 #> splits id id2 #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> 4 Repeat1 Fold04 #> 5 Repeat1 Fold05 #> 6 Repeat1 Fold06 #> 7 Repeat1 Fold07 #> 8 Repeat1 Fold08 #> 9 Repeat1 Fold09 #> 10 Repeat1 Fold10 #> 11 Repeat2 Fold01 #> 12 Repeat2 Fold02 #> 13 Repeat2 Fold03 #> 14 Repeat2 Fold04 #> 15 Repeat2 Fold05 #> 16 Repeat2 Fold06 #> 17 Repeat2 Fold07 #> 18 Repeat2 Fold08 #> 19 Repeat2 Fold09 #> 20 Repeat2 Fold10

Slide 58

Slide 58 text

\STBNQMF^ަࠩݕূ๏ͷಋೖ # 2ճͷ܁Γฦ͠ɺ10෼ׂަࠩݕূ lp_folds <- vfold_cv(lp_train, v = 10, repeats = 2, strata = gas) lp_folds #> # 10-fold cross-validation repeated 2 times using stratification #> # A tibble: 20 × 3 #> splits id id2 #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> 4 Repeat1 Fold04 #> 5 Repeat1 Fold05 #> 6 Repeat1 Fold06 #> 7 Repeat1 Fold07 #> 8 Repeat1 Fold08 #> 9 Repeat1 Fold09 #> 10 Repeat1 Fold10 splits

Slide 59

Slide 59 text

# 2ճͷ܁Γฦ͠ɺ10෼ׂަࠩݕূ lp_folds <- vfold_cv(lp_train, v = 10, repeats = 2, strata = gas) lp_folds #> # 10-fold cross-validation repeated 2 times using stratification #> # A tibble: 20 × 3 #> splits id id2 #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> 4 Repeat1 Fold04 #> 5 Repeat1 Fold05 #> 6 Repeat1 Fold06 #> 7 Repeat1 Fold07 #> 8 Repeat1 Fold08 #> 9 Repeat1 Fold09 #> 10 Repeat1 Fold10 \STBNQMF^ަࠩݕূ๏ͷಋೖ lp_folds$splits[1:3] #> [[1]] #> #> <1304/145/1449> #> #> [[2]] #> #> <1304/145/1449> #> #> [[3]] #> #> <1304/145/1449>

Slide 60

Slide 60 text

\UVOF^Ϧαϯϓϧσʔλ΁ͷॲཧ lp_fit_rs <- fit_resamples(tree_wflow, lp_folds) lp_fit_rs #> # Resampling results #> # 10-fold cross-validation repeated 2 times using stratification #> # A tibble: 20 × 5 #> splits id id2 .metrics .notes #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> … (লུ) #> 18 Repeat2 Fold08 #> 19 Repeat2 Fold09 #> 20 Repeat2 Fold10 Ϧαϯϓϧσʔλ͝ͱʹֶशΛߦ͏

Slide 61

Slide 61 text

\UVOF^Ϧαϯϓϧσʔλ΁ͷॲཧ lp_fit_rs <- fit_resamples(tree_wflow, lp_folds) lp_fit_rs #> # Resampling results #> # 10-fold cross-validation repeated 2 times using stratification #> # A tibble: 20 × 5 #> splits id id2 .metrics .notes #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> … (লུ) #> 18 Repeat2 Fold08 #> 19 Repeat2 Fold09 #> 20 Repeat2 Fold10 .metrics 行

Slide 62

Slide 62 text

\UVOF^Ϧαϯϓϧσʔλ΁ͷॲཧ lp_fit_rs$.metrics[[1]] #> # A tibble: 2 × 4 #> .metric .estimator .estimate .config #> #> 1 accuracy binary 0.917 Preprocessor1_Model1 #> 2 roc_auc binary 0.800 Preprocessor1_Model1 ͢΂ͯͷϦαϯϓϧσʔλ͔Βੑೳͷฏۉ஋ΛٻΊΔ lp_fit_rs |> collect_metrics() #> # A tibble: 2 × 6 #> .metric .estimator mean n std_err .config #> #> 1 accuracy binary 0.909 20 0.00566 Preprocessor1_Model1 #> 2 roc_auc binary 0.776 20 0.0195 Preprocessor1_Model1

Slide 63

Slide 63 text

\UVOF^Ϧαϯϓϧσʔλ΁ͷॲཧ lp_ctrl <- # Ϧαϯϓϧσʔλʹର͢Δ༧ଌ஋Λྻʹอଘ͢ΔͨΊ control_resamples(save_pred = TRUE) lp_preds <- fit_resamples(tree_wflow, lp_folds, control = lp_ctrl) |> # ϦαϯϓϧσʔλΛల։ͯ͠ҰͭͷσʔλϑϨʔϜʹ collect_predictions() lp_preds #> # A tibble: 2,898 × 8 #> id id2 .pred_FALSE .pred_TRUE .row .pred_class gas .config #> #> 1 Repeat1 Fold01 0.0359 0.964 4 TRUE TRUE Preprocessor1_… #> 2 Repeat1 Fold01 0 1 9 TRUE TRUE Preprocessor1_… #> 3 Repeat1 Fold01 0.889 0.111 22 FALSE FALSE Preprocessor1_… #> …(লུ)

Slide 64

Slide 64 text

\UVOF^Ϧαϯϓϧσʔλ΁ͷॲཧ lp_preds |> group_by(id) |> lp_metrics(truth = gas, estimate = .pred_class) #> # A tibble: 6 × 4 #> id .metric .estimator .estimate #> #> 1 Repeat1 accuracy binary 0.906 #> 2 Repeat2 accuracy binary 0.912 #> 3 Repeat1 sensitivity binary 0.270 #> 4 Repeat2 sensitivity binary 0.270 #> 5 Repeat1 specificity binary 0.965 #> 6 Repeat2 specificity binary 0.971

Slide 65

Slide 65 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ UJEZNPEFMTύοέʔδΛར༻͢Δ͜ͱͰɺػցֶशϞσϦϯάͰߦ͏޻ఔΛ ౷ҰతΠϯλʔϑΣΠεͷ΋ͱͰਐΊΒΕΔ σʔλ෼ׂɺϦαϯϓϦϯάˠ\STBNQMF^ ֶशثͷߏஙˠ\QBSTOJQ^ ϫʔΫϑϩʔԽˠ\XPSL fl PXT^ ੑೳධՁˠ\ZBSETUJDL^ ϦαϯϓϦϯάσʔλ΁ͷϞσϧͷ౰ͯ͸Ίˠ\UVOF^

Slide 66

Slide 66 text

\NMSWFSTF^Ͱ͜͜·Ͱͷ޻ఔΛ࣮ߦ library(mlr3verse) lp_supply_chr2fct <- lp_supply |> mutate(across(where(is.character), as.factor)) # λεΫͷఆٛ lp_task <- as_task_classif(lp_supply_chr2fct, target = "gas") # σʔλ෼ׂʢ૚Խநग़ɺ80%Λֶशσʔλʹʣ set.seed(123) lp_split_mlr <- partition(lp_task, stratify = TRUE, ratio = 0.8) # ֶशثͷߏங tree_learner <- lrn("classif.rpart", cp = 0.002)

Slide 67

Slide 67 text

\NMSWFSTF^Ͱ͜͜·Ͱͷ޻ఔΛ࣮ߦ # ϦαϯϓϦϯάσʔλͷ༻ҙʢ2ճͷ܁Γฦ͠5෼ׂަࠩݕূ๏ʣ lp_folds_mlr <- rsmp("repeated_cv", repeats = 2, folds = 5) rr <- resample(lp_task, tree_learner, lp_folds_mlr) rr$aggregate(msr("classif.acc")) #> classif.acc #> 0.9213476 rr$aggregate(msr("classif.sensitivity")) #> classif.sensitivity #> 0.3157344 rr$aggregate(msr("classif.specificity")) #> classif.specificity #> 0.9747444

Slide 68

Slide 68 text

\SFDJQFT^ಛ௃ྔΤϯδχΞϦϯά ϞσϧΛݟ௚͢ ˠϞσϧʹ࢖ΘΕΔม਺ʢಛ௃ྔʣ ˠϞσϧͰ࢖ΘΕΔΞϧΰϦζϜ ࠷ॳʹ࡞ͬͨϞσϧͰಘΒΕͨಎ࡯͔Βʜ lp_supply$water |> unique() #> [1] TRUE

Slide 69

Slide 69 text

\SFDJQFT^ಛ௃ྔΤϯδχΞϦϯά recipe(gas ~ ., data = lp_train) |> step_zv(all_predictors()) lp_supply$water |> unique() #> [1] TRUE step_*() 行 step_*() 目 TUFQ@[W ؔ਺͸ɺ୯Ұͷ஋͔ΒͳΔม਺ΛϞσϧ͔ࣜΒআ֎͢Δʢ;FSPWBSJBODF fi MUFSʣ ಛ௃ྔ΁ͷૢ࡞ΛʮϨγϐʯͱͯ͠ఆٛ͢Δ

Slide 70

Slide 70 text

\SFDJQFT^ಛ௃ྔΤϯδχΞϦϯά lp_rec <- recipe(gas ~ ., data = lp_train) |> step_zv(all_predictors()) |> step_log(price, dist_from_st, offset = 0.01) |> step_normalize(all_numeric_predictors()) |> step_dummy(all_nominal_predictors()) lp_rec #> #> ── Recipe ────────────────────────────────────────────────────────────────────── #> #> ── Inputs #> Number of variables by role #> outcome: 1 #> predictor: 7 #> #> ── Operations #> • Zero variance filter on: all_predictors() #> • Log transformation on: price and dist_from_st #> • Centering and scaling for: all_numeric_predictors() #> • Dummy variables from: all_nominal_predictors() TUFQ@ ΛඞཁʹԠͯ͡௥Ճ͢Δ 用

Slide 71

Slide 71 text

\SFDJQFT^ಛ௃ྔΤϯδχΞϦϯά prep(lp_rec) |> # ӨڹΛ͏͚Δม਺ͱͦͷޮՌʹ͍ͭͯ֬ೝ # stepͷॱ൪Λࢦఆ tidy(number = 1) #> # A tibble: 1 × 2 #> terms id #> #> 1 water zv_KsRlw

Slide 72

Slide 72 text

\SFDJQFT^ಛ௃ྔΤϯδχΞϦϯά prep(lp_rec) |> bake(new_data = lp_train) |> glimpse() #> Rows: 1,449 #> Columns: 7 #> $ price -1.0975606, -1.6677181, -1.8257103, -0.7057626, -0.559127… #> $ sewer TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU… #> $ above_floor 1.4569564, -1.3153743, -0.2757503, 0.4173324, 2.8431217, … #> $ under_floor 1.9139251, -0.3518245, -0.3518245, 1.9139251, -0.3518245,… #> $ dist_from_st 0.63568930, 0.46296422, 0.86095169, -0.37863009, -4.89537… #> $ gas TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU… #> $ fire_๷Ր 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, … 用 用

Slide 73

Slide 73 text

\XPSL fl PXT^ϫʔΫϑϩʔͷमਖ਼ tree_wflow |> remove_formula() |> add_recipe(lp_rec) #> ══ Workflow ════════════════════════════════════════════════════════════════════ #> Preprocessor: Recipe #> Model: decision_tree() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────── #> 4 Recipe Steps #> #> • step_zv() #> • step_log() #> • step_normalize() #> • step_dummy() #> #> ── Model ─────────────────────────────────────────────────────────────────────── #> Decision Tree Model Specification (classification) #> #> Main Arguments: #> cost_complexity = 0.002 #> #> Computational engine: rpart

Slide 74

Slide 74 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ UJEZNPEFMTύοέʔδΛར༻͢Δ͜ͱͰɺػցֶशϞσϦϯάͰߦ͏޻ఔΛ ౷ҰతΠϯλʔϑΣΠεͷ΋ͱͰਐΊΒΕΔ σʔλ෼ׂɺϦαϯϓϦϯάˠ\STBNQMF^ ֶशثͷߏஙˠ\QBSTOJQ^ ϫʔΫϑϩʔԽˠ\XPSL fl PXT^ ੑೳධՁˠ\ZBSETUJDL^ ϦαϯϓϦϯάσʔλ΁ͷϞσϧͷ౰ͯ͸Ίˠ\UVOF^ ಛ௃ྔΤϯδχΞϦϯάˠ\SFDJQFT^

Slide 75

Slide 75 text

ϥϯμϜϑΥϨετͷ࣮૷ͱϞσϧͷൺֱ rf_spec <- rand_forest(trees = 1000, mode = "classification") |> set_engine("randomForest") rf_spec #> Random Forest Model Specification (classification) #> #> Main Arguments: #> trees = 1000 #> #> Computational engine: randomForest

Slide 76

Slide 76 text

\XPSL fl PXTFUT^ෳ਺ϞσϧɺϨγϐΛѻ͏ wf_set_fit <- workflow_set(preproc = list(none = gas ~ ., prep = lp_rec), models = list(tree_spec, rf_spec), cross = TRUE) |> workflow_map("fit_resamples", resamples = lp_folds) wf_set_fit |> rank_results() #> # A tibble: 8 × 9 #> wflow_id .config .metric mean std_err n preprocessor model rank #> #> 1 prep_rand_forest Prepro… accura… 0.924 0.00475 20 recipe rand… 1 #> 2 prep_rand_forest Prepro… roc_auc 0.873 0.0155 20 recipe rand… 1 #> 3 none_rand_forest Prepro… accura… 0.927 0.00460 20 formula rand… 2 #> 4 none_rand_forest Prepro… roc_auc 0.870 0.0165 20 formula rand… 2 #> 5 none_decision_tr… Prepro… accura… 0.909 0.00566 20 formula deci… 3 #> 6 none_decision_tr… Prepro… roc_auc 0.776 0.0195 20 formula deci… 3 #> 7 prep_decision_tr… Prepro… accura… 0.909 0.00566 20 recipe deci… 4 #> 8 prep_decision_tr… Prepro… roc_auc 0.776 0.0195 20 recipe deci… 4

Slide 77

Slide 77 text

\XPSL fl PXTFUT^ෳ਺ϞσϧɺϨγϐΛѻ͏ wf_set_fit <- workflow_set(preproc = list(none = gas ~ ., prep = lp_rec), models = list(tree_spec, rf_spec), cross = TRUE) |> workflow_map("fit_resamples", resamples = lp_folds) wf_set_fit |> rank_results() #> # A tibble: 8 × 9 #> wflow_id .config .metric mean std_err n preprocessor model rank #> #> 1 prep_rand_forest Prepro… accura… 0.924 0.00475 20 recipe rand… 1 #> 2 prep_rand_forest Prepro… roc_auc 0.873 0.0155 20 recipe rand… 1 #> 3 none_rand_forest Prepro… accura… 0.927 0.00460 20 formula rand… 2 #> 4 none_rand_forest Prepro… roc_auc 0.870 0.0165 20 formula rand… 2 #> 5 none_decision_tr… Prepro… accura… 0.909 0.00566 20 formula deci… 3 #> 6 none_decision_tr… Prepro… roc_auc 0.776 0.0195 20 formula deci… 3 #> 7 prep_decision_tr… Prepro… accura… 0.909 0.00566 20 recipe deci… 4 #> 8 prep_decision_tr… Prepro… roc_auc 0.776 0.0195 20 recipe deci… 4

Slide 78

Slide 78 text

\UVOF^ϋΠύʔύϥϝʔλͷ୳ࡧ rf_tune_spec <- rand_forest(min_n = , trees = , mode = "classification") |> set_engine("randomForest") rf_tune_spec #> Random Forest Model Specification (classification) #> #> Main Arguments: #> trees = tune() #> min_n = tune() #> #> Computational engine: randomForest tune() tune() 行 tune()

Slide 79

Slide 79 text

\UVOF^ϋΠύʔύϥϝʔλͷ୳ࡧ rf_tune_wflow <- workflow() |> add_model(rf_tune_spec) |> add_recipe(lp_rec) rf_fit_tune_res #> # A tibble: 20 × 5 #> splits id id2 .metrics .notes #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> …ʢলུʣ rf_fit_tune_res <- tune_grid(rf_tune_wflow, lp_folds, grid = 5)

Slide 80

Slide 80 text

\UVOF^ϋΠύʔύϥϝʔλͷ୳ࡧ finalize_workflow(rf_tune_wflow, lp_tune_best_parameter) |> last_fit(lp_split) |> collect_metrics() #> # A tibble: 2 × 4 #> .metric .estimator .estimate .config #> #> 1 accuracy binary 0.950 Preprocessor1_Model1 #> 2 roc_auc binary 0.909 Preprocessor1_Model1 lp_tune_best_parameter <- select_best(rf_fit_tune_res, metric = "roc_auc") lp_tune_best_parameter #> # A tibble: 1 × 3 #> trees min_n .config #> #> 1 1951 15 Preprocessor1_Model5

Slide 81

Slide 81 text

͜͜·Ͱͷཁ఺ σʔλ෼ੳɺػցֶशͷ޻ఔ͸ଟذʹΘͨΓɺͦͷଟ͘͸൓෮తʹߦΘΕΔ ˠҰ࿈ͷखଓ͖Λԁ׈ʹ͢ΔͨΊͷϫʔΫϑϩʔͷར༻͕๬·͍͠ 3ʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠ɺUJEZWFSTF΍NMS͕͋Δɻ ·ͨϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9΋ར༻Ͱ͖Δ UJEZNPEFMTύοέʔδΛར༻͢Δ͜ͱͰɺػցֶशϞσϦϯάͰߦ͏޻ఔΛ ౷ҰతΠϯλʔϑΣΠεͷ΋ͱͰਐΊΒΕΔ σʔλ෼ׂɺϦαϯϓϦϯάˠ\STBNQMF^ ֶशثͷߏஙˠ\QBSTOJQ^ ϫʔΫϑϩʔԽˠ\XPSL fl PXT^ ੑೳධՁˠ\ZBSETUJDL^ ϦαϯϓϦϯάσʔλ΁ͷϞσϧͷ౰ͯ͸ΊɺϋΠύʔύϥϝʔλ୳ࡧˠ\UVOF^ ಛ௃ྔΤϯδχΞϦϯάˠ\SFDJQFT^ ϫʔΫϑϩʔͷ؅ཧͱൺֱˠ\XPSL fl PXTFUT^

Slide 82

Slide 82 text

ୈҰ෦ͷ·ͱΊ 3ݴޠʹ͓͚ΔػցֶशϑϨʔϜϫʔΫͱͯ͠UJEZNPEFMTΛ঺հ NMSWFSTFͰ΋UJEZNPEFMTͰߦͬͨॲཧΛ࣮ݱͰ͖Δ ϞσϧͷղऍΛॿ͚ΔͨΊͷ%"-&9͸UJEZNPEFMT΍NMSʹରԠ ൓෮తૢ࡞ɺෳ਺ϞσϧɾϨγϐͷݕ౼͕ඞཁͳػցֶशϞσϦϯάʹ͓͍ͯ͸ ΦϒδΣΫτͷ؅ཧ͕ॏཁʜʢϫʔΫϑϩʔԽΛਪ঑ʣ

Slide 83

Slide 83 text

ࢀߟࢿྉ .BY,VIO BOE,KFMM+PIOTPO l'FBUVSF&OHJOFFSJOHBOE4FMFDUJPOz IUUQTCPPLEPXOPSHNBY'&4 .BY,VIOBOE+VMJB4JMHF 5JEZ.PEFMJOHXJUI3IUUQTXXXUNXSPSH দଜ༏࠸ ӝੜਅ໵ ٢ଜ޿ࢤ ʮ3ϢʔβͷͨΊͷUJEZNPEFMT<࣮ફ>ೖ໳ʙ Ϟμϯͳ౷ܭɾػցֶशϞσϦϯάͷੈքʯٕज़ධ࿦ࣾ 1S[FNZT“BX#JFDFLBOE5PNBT[#VS[ZLPXTLJ &YQMBOBUPSZ.PEFM "OBMZTJTIUUQTFNBESXIZBJ .BY,VIOBOE,KFMM+PIOTPO "QQMJFE.BDIJOF-FBSOJOHGPS5BCVMBS %BUBIUUQTBNMUEPSH #FSOE#JTDIM 3BQIBFM4POBCFOE -BST,PUUIP ff .JDIFM-BOH "QQMJFE .BDIJOF-FBSOJOH6TJOHNMSJO3IUUQTNMSCPPLNMSPSHDPN