Slide 1

Slide 1 text

Uryu Shinya @u_ribo tidymodelsͰ֮͑Δ RͰͷϞσϧߏஙͱӡ༻

Slide 2

Slide 2 text

໨࣍ UJEZNPEFMTʹΑΔϞσϧߏங Ϟσϧͷվળͱӡ༻ σʔλϞσϦϯάͷϫʔΫϑϩʔ 1 2 3

Slide 3

Slide 3 text

σʔλϞσϦϯάͷ ϫʔΫϑϩʔ

Slide 4

Slide 4 text

Garrett and Hadley (2016)ΑΓ࡞੒ tidymodels͸͜͜ σʔλ෼ੳͷϫʔΫϑϩʔ

Slide 5

Slide 5 text

൓෮తͳ࡞ۀͰϞσϧΛຏ্͖͍͛ͯ͘ Ϟσϧߏஙͷయܕతͳ࿮૊Έ Max and Kjell (2019)ΑΓ࡞੒

Slide 6

Slide 6 text

B σʔλͷಛ௃ɺσʔλؒͷؔ܎Λ஌ΓɺॳظϞσϧʹར༻͢ Δʮग़ൃ఺ʯΛݟ͚ͭΔͨΊͷࢹ֮Խɻ C ౷ܭྔͷूܭ΍໨తม਺ͱڧ͍૬ؔͷ͋Δม਺ΛಛఆɺϞσ ϧʹର͢ΔԾઆΛཱͯΔɻσʔλΛे෼ʹཧղͰ͖ͨͱݴ͑ Δ·Ͱɺؔ܎ΛՄࢹԽ͠ɺ͞ΒͳΔఆྔ෼ੳΛ܁Γฦ͢ɻ D σʔλΛॳظϞσϧʹద༻͢ΔͨΊͷ४උɻ ͍͟ɺϞσϦϯάͷ࣮ߦʂͱ͸͍͔ͳ͍

Slide 7

Slide 7 text

E ॳظϞσϧͷ࣮ߦɻॳظϞσϧʹར༻ͨ͠σʔλͰɺɹɹɹ ͍͔ͭ͘ͷϞσϧ΋ద༻ɺൺֱɻɹɹɹɹɹɹɹɹɹɹɹɹ ϋΠύʔύϥϝʔλͷ୳ࡧ΋͜͜ͰߦΘΕΔɻ F ෳ਺ճߦΘΕͨύϥϝʔλௐ੔ͷ݁ՌΛ෼ੳ G Ϟσϧͷ݁ՌΛՄࢹԽ ෳ਺ͷϞσϧͰͷੑೳΛൺֱ͢Δ

Slide 8

Slide 8 text

H ॳظϞσϧΛվྑ͢Δಛ௃ྔΤϯδχΞϦϯά I ࠷ऴతͳީิϞσϧʹର͢Δௐ੔ J ධՁηοτΛར༻ͨ͠൚ԽੑೳͷධՁ K ӡ༻ ϞσϧΛվળ͢Δಛ௃ྔΛ୳͢

Slide 9

Slide 9 text

ෳ਺ͷ޻ఔΛ൓෮తʹߦ͏ ࡞ۀ߲໨ ܾఆࣄ߲ ධՁࢦඪ ಛ௃ྔΤϯδχΞϦϯά Ϟσϧͷܾఆɾ࣮ߦɾൺֱ ൚ԽੑೳͷධՁɾվળ λεΫઃఆ ύϥϝʔλ୳ࡧ ୳ࡧతσʔλ෼ੳ ճؼ ෼ྨ ܽଛ΁ͷରॲ ࡟আ ิ׬ 3.4& "6$ ༏ઌ౓ ద߹཰ ൚Խੑೳ ղऍੑ ܰྔ όϦσʔγϣϯ ελοΩϯά άϦουαʔν ϕΠζ࠷దԽ ϥϯμϜϑΥϨετ (#%5 χϡʔϥϧωοτ εέʔϦϯά ΤϯίʔσΟϯά 1$"

Slide 10

Slide 10 text

ͰσʔλϞσϦϯάΛߦ͏ࡍͷ՝୊ ଟ͘ͷύοέʔδ͕։ൃ͞Ε͍ͯΔ͕ ΠϯλʔϑΣΠεʹ౷Ұੑ͕ͳ͍ ϞσϧΦϒδΣΫτΛѻ͏ formulaͷಠࣗੑ 5IF3'PSNVMB.FUIPE5IF(PPE1BSUTu37JFXT IUUQTSWJFXTSTUVEJPDPNUIFSGPSNVMBNFUIPEUIFHPPEQBSUT 5IF3'PSNVMB.FUIPE5IF#BE1BSUTu37JFXT IUUQTSWJFXTSTUVEJPDPNUIFSGPSNVMBNFUIPEUIFCBEQBSUT ύοέʔδ ܾఆ໦ʹ༩͑Δ ಛ௃ྔ਺ ࡞੒͢Δ ܾఆ໦ͷ਺ ϊʔυதͷ ࠷খαϯϓϧ਺ ranger mtry num.trees min.node.size randomForest mtry ntree nodesize sparklyr mtry num.trees min_instances_per_node

Slide 11

Slide 11 text

{tidymodels} ౷ܭղੳɺϞσϦϯάͷͨΊͷύοέʔδ܈

Slide 12

Slide 12 text

UJEZNPEFMTW library(tidymodels) ✓ broom 0.7.1 ✓ recipes 0.1.13 ✓ dials 0.0.9 ✓ rsample 0.0.8 ✓ dplyr 1.0.2 ✓ tibble 3.0.4 ✓ infer 0.5.3 ✓ tidyr 1.1.2 ✓ modeldata 0.0.2 ✓ tune 0.1.1 ✓ parsnip 0.1.3 ✓ workflows 0.2.1 ✓ purrr 0.3.4 ✓ yardstick 0.0.7 UJEZWFSTFύοέʔδͱಉ͡఩ֶࢥ૝Ͱ։ൃ͞ΕΔ ҰͭͷύοέʔδΛ ಡΈࠐΉͱ ෳ਺ͷύοέʔδ͕ ར༻ՄೳʹͳΔ ౷Ұ͞ΕͨΠϯλʔϑΣΠεΛఏڙ ύΠϓԋࢉࢠ ϑϨϯυϦʔ ؔ਺ɺ Ҿ਺໊ͷ໌֬ੑ

Slide 13

Slide 13 text

{tidymodels}ʹؚ·ΕΔύοέʔδ {parsnip} {recipes} {rsample} {yardstick} Ϟσϧߏஙɾద༻ ϞσϧͷੑೳධՁ σʔλલॲཧɺ ಛ௃ྔੜ੒ ෼ׂɺϦαϯϓϦϯά {dials} {tune} {workflows} Ϟσϧద༻·ͰͷॲཧΛ ϫʔΫϑϩʔԽ ύϥϝʔλ୳ࡧɾௐ੔ ͜ͷεϥΠυͰѻ͏΋ͷ

Slide 14

Slide 14 text

UJEZNPEFMTʹΑΔ Ϟσϧߏங

Slide 15

Slide 15 text

ԋश ࠃ౔਺஋৘ใ஍Ձެࣔσʔλ ஍ՁՁ֨Λ༧ଌ͢ΔϞσϧΛߏங͢Δʢճؼ໰୊ʣ dplyr ::glimpse(df_lp) #> Rows: 8,476 #> Columns: 8 #> $ log_lp 3.618048, 4.591065, 4.754348… #> $ distance_from_station 8700, 13000, 13000, 5500, 80… #> $ acreage 317, 166, 226, 274, 357, 173, 661… #> $ current_use "ॅ୐,ͦͷଞ", "ॅ୐", "ళฮ"… #> $ building_coverage 0, 60, 80, 70, 70, 60, 70, 70, 70… #> $ building_structure W, W, W, W, W, W, W, W, W, W, W… #> $ .longitude 138.5383, 138.5921, 138.5933… #> $ .latitude 36.46920, 36.61913, 36.62025… 出典: 国⼟交通省 国⼟数値情報 地価公⽰データ 第2.4版 L01 平成30年度 https://nlftp.mlit.go.jp/ksj/jpgis/datalist/KsjTmplt-L01-v1_1.html

Slide 16

Slide 16 text

ม਺໊ આ໌ ܕ log_lp ஍ՁՁ֨Λৗ༻ର਺ʹͨ͠஋ ࣮਺ distance_from_station Ӻ͔Βͷڑ཭(m) ੔਺ acreage ஍ੵ(m2) ੔਺ current_use ར༻ݱگɻඪ४஍ͷݱࡏͷར༻ํ๏Λࣔ͢ΧςΰϦɻ ෳ਺ͷΧςΰϦʹͳΔ͜ͱ΋͋Δɻ Ҽࢠ building_coverage ݐ΃͍཰ɻݐங෺ͷԆ΂໘ੵͷෑ஍໘ੵʹର͢Δׂ߹ ࣮਺ building_structure ݐ෺ߏ଄ɻඪ४஍ͷݐ෺ͷߏ଄ʹΑΔ۠ผɻ SRCɿమࠎɾమےίϯΫϦʔτ, RCɿమےίϯΫϦʔτ, Sɿమࠎ଄, BɿϒϩοΫ଄, Wɿ໦଄ɻະهࡌͷ৔߹͸ UNKNOWN Ҽࢠ .longitude ܦ౓ɻ஍Ձެࣔඪ४஍ͷҐஔΛࣔ͢ ࣮਺ .latitude Ң౓ɻ஍Ձެࣔඪ४஍ͷҐஔΛࣔ͢ ࣮਺ ԋश ࠃ౔਺஋৘ใ஍Ձެࣔσʔλ

Slide 17

Slide 17 text

஍ՁՁ֨ͷର਺Խ ֎Ε஋ ෼ࢄ͕҆ఆ ߴՁ֨ͷ஍ՁͷӨڹΛड͚Δ ϚΠφεͷ஋ʹͳΒͳ͍

Slide 18

Slide 18 text

{rsample} σʔληοτͷ෼ׂɺϦαϯϓϦϯά

Slide 19

Slide 19 text

σʔληοτશମΛ ֶशηοτ USBJO ɺධՁηοτ UFTU ʹ෼͚Δ ֶशηοτ ධՁηοτ σʔλ෼ׂ 3Ͱͷφ΢ͳσʔλ෼ׂͷ΍ΓํSTBNQMFύοέʔδʹΑΔަࠩݕূגࣜձࣾϗΫιΤϜͷϒϩά IUUQTCMPHIPYPNDPNFOUSZ σʔληοτ Ϟσϧͷֶशʹ༻͍Δ ϞσϧͷੑೳධՁΛଌఆ͢ΔͨΊɺ ະ஌ͷ৘ใͱͯ͠༩͑Δ

Slide 20

Slide 20 text

෼ׂ͸ϥϯμϜ σʔληοτͷׂΛ෼ੳηοτͱ͢Δ lp_split <- initial_split(df_lp, prop = 0.8, strata = log_lp) lp_split #> #> <6358/2118/8476> lp_train <- training(lp_split) # ֶशηοτ lp_test <- testing(lp_split)ɹ# ධՁηοτ σʔλ෼ׂ ஍Ձͷ෼෍ʹԠͨ͡ ૚ผαϯϓϦϯάΛࢦఆ

Slide 21

Slide 21 text

σʔλ෼ׂ ஍ՁՁ֨ͷ૚ผαϯϓϦϯά ࢛෼Ґ఺͝ͱʹσʔλΛ۠੾Δʢ૚ʣ ˠ૚͝ͱʹϦαϯϓϦϯάΛ࣮ࢪʢܭճʣ ֶशͱධՁηοτؒͰภΓͳ͘ɺ ෼෍Λྨࣅͤ͞ΔͨΊͷख๏ ෼ྨ໰୊ʹ͓͍ͯɺϥϕϧ਺ͷ ෆۉߧ͕ੜ͡Δ৔߹ʹ΋༗ޮ ֶश ධՁ

Slide 22

Slide 22 text

{recipes} σʔλΛϞσϧʹద༻͢ΔͨΊͷલॲཧɺ ಛ௃ྔΤϯδχΞϦϯά

Slide 23

Slide 23 text

લॲཧɾಛ௃ྔΤϯδχΞϦϯά Ϟσϧʹ༻͍ΔσʔλՃ޻ͷखଓ͖ΛʮϨγϐʯԽ ϞσϧͰѻ͏σʔλͷલॲཧΛSFDJQFTͰߦ͏גࣜձࣾϗΫιΤϜͷϒϩά IUUQTCMPHIPYPNDPNFOUSZ 1 2 3 recipe() step_*() prep() bake() 4 ར༻͢Δม਺ͷؔ܎Λఆٛ ˠࡐྉΛࢦఆ͢Δ σʔλՃ޻ͷखଓ͖Λࢦఆ ˠௐཧ๏Λهड़͢Δ σʔληοτʹద༻ ˠௐཧΛߦ͏ TUFQ@ ͷॲཧΛ౷߹ ˠϨγϐΛ֬ೝ͢Δ

Slide 24

Slide 24 text

init_lp_recipe <- lp_train %>% #> # log_lp Λ໨తม਺ɺଞͷม਺Λઆ໌ม਺ʹͨ͠Ϟσϧ recipe(formula = log_lp ~ .) %>% #> # εςοϓ1: acreageΛର৅ʹৗ༻ର਺ʹม׵ step_log(acreage, base = 10) Ϟσϧ΁ͷॲཧΛύΠϓԋࢉࢠͰ௥Ճ step_log( recipe(lp_train, log_lp ~ .), acreage, base = 10) ౰વɺؔ਺ΛೖΕࢠʹهड़ͯ͠΋0,

Slide 25

Slide 25 text

ͲΜͳॲཧΛࢦఆͰ͖Δͷʁ step_*() ؔ਺͸ ͱͯ͠ఏڙ͞ΕΔ εέʔϦϯά ΤϯίʔσΟϯά ೔෇ɾ࣌ؒ ϑΟϧλॲཧ ࣍ݩ࡟ݮ ͳͲ ls("package:recipes", pattern = “^step_") #> # 77ݸͷstep_*ؔ਺ (version 0.1.14) ઐ໳ʹಛԽͨ͠ύοέʔδ΋ {textrecipes} จࣈྻ {embed} {themis} ෆۉߧ ΧςΰϦΧϧ

Slide 26

Slide 26 text

ৄ͘͠͸ͪ͜Β https://uribo.github.io/dpp-cookbook/ http://bit.ly/slide-fe-recipes http://bit.ly/practical-ds

Slide 27

Slide 27 text

step_*()Ͱͷม਺ͷࢦఆํ๏ จࣈྻͰͷࢦఆ tidyselectͷؔ਺ Ϟσϧ಺Ͱͷrole 1 2 3 ม਺ͷσʔλܕ 4 all_predictors() all_outcomes() starts_with() contains()ͳͲ all_nominal() all_numeric() "acreage" "building_structure" dͰ࢝·Δ dΛؚΉ આ໌ม਺ ໨తม਺ ΧςΰϦ ਺஋

Slide 28

Slide 28 text

step_*()ͷ௥Ճ ม਺DVSSFOU@VTFͷ߲໨͕ଟա͗Δ શମͷະຬͷ߲໨͸͢΂ͯzPUIFSzͱ͢Δ 1 2 EJTUBODF@GSPN@TUBUJPO΋ର਺ม׵͍ͨ͠ ͨͩ͠ɺڑ཭ͷ৔߹ʹ͸*OpOJUZʹͳΒͳ͍Α͏ɺMPH 3 4 ΧςΰϦม਺Λμϛʔม਺Խ ͢΂ͯͷม਺Λฏۉɺ෼ࢄͷඪ४Խ

Slide 29

Slide 29 text

init_lp_recipe <- init_lp_recipe %>% step_mutate(distance_from_station = if_else(distance_from_station == 0, 0.1, as.double(distance_from_station))) %>% step_log(distance_from_station, base = 10) %>% step_other(current_use, threshold = 0.01) %>% step_dummy(all_nominal()) %>% step_normalize(all_predictors()) step_*()ͷ௥Ճ 1 2 3 4

Slide 30

Slide 30 text

લॲཧɾಛ௃ྔΤϯδχΞϦϯά Ϟσϧʹ༻͍ΔσʔλՃ޻ͷखଓ͖ΛʮϨγϐʯԽ ϞσϧͰѻ͏σʔλͷલॲཧΛSFDJQFTͰߦ͏גࣜձࣾϗΫιΤϜͷϒϩά IUUQTCMPHIPYPNDPNFOUSZ 1 2 3 recipe() step_*() prep() bake() 4 ར༻͢Δม਺ͷؔ܎Λఆٛ ˠࡐྉΛࢦఆ͢Δ σʔλՃ޻ͷखଓ͖Λࢦఆ ˠௐཧ๏Λهड़͢Δ σʔληοτʹద༻ ˠௐཧΛߦ͏ TUFQ@ ͷॲཧΛ౷߹ ˠϨγϐΛ֬ೝ͢Δ

Slide 31

Slide 31 text

lp_rec_prepped <- prep(init_lp_recipe) #> Data Recipe #> #> Inputs: #> role #variables #> outcome 1 #> predictor 7 #> #> Training data contained 6358 data points and no missing data. #> #> Operations: #> Log transformation on acreage [trained] #> Variable mutation for distance_from_station [trained] #> Log transformation on distance_from_station [trained] #> Collapsing factor levels for current_use [trained] #> Dummy variables from current_use, building_structure [trained] #> Centering and scaling for distance_from_station, acreage, ... [trained] recipeͷ׬੒

Slide 32

Slide 32 text

σʔληοτʹϨγϐΛద༻ lp_test_prepped <- lp_rec_prepped %>% bake(new_data = lp_test) ෼ੳηοτ ධՁηοτ lp_train_prepped <- lp_rec_prepped %>% bake(new_data = NULL) glimpse(lp_train_prepped) #> Observations: 6,358 #> Variables: 22 #> $ distance_from_station 1.48883723, 1.74347636, … #> $ acreage 0.348700377, -0.368317326, -0.026333976, … #> … #> $ current_use_ॅ୐.ళฮ -0.2244556, -0.2244556, -0.2244556, … #> … #> $ current_use_other -0.2876676, -0.2876676, -0.2876676, …

Slide 33

Slide 33 text

{parsnip} Ϟσϧͷ࡞੒ ଟ༷ͳϞσϦϯάύοέʔδΛѻ͏

Slide 34

Slide 34 text

Ϟσϧߏங ࢓༷Λఆٛ ΤϯδϯʢύοέʔδʣΛࢦఆ Ϟσϧͷ౰ͯ͸Ί 1 set_engine() ՝୊ʹదͨ͠ϞσϧΛબͿ 2 3 fit() linear_reg() rand_forest() logistic_reg() ֶशηοτͷద༻ predict() ධՁηοτͰͷ༧ଌ

Slide 35

Slide 35 text

ઢܕճؼϞσϧ lm_model <- linear_reg() %>% set_engine("lm") class(lm_model) #> [1] "linear_reg" "model_spec" lm_formula_fit <- lm_model %>% fit(log_lp ~ ., data = lp_train_prepped) lm(log_lp ~ ., data = lp_train_prepped) ☝ಉ݁͡Ռ

Slide 36

Slide 36 text

ઢܕճؼϞσϧ df_lm_model_predict <- lp_test_prepped %>% select(log_lp) %>% bind_cols( predict( lm_formula_fit, new_data = lp_test_prepped))

Slide 37

Slide 37 text

gb_model <- boost_tree(trees = 1000, mtry = 3, tree_depth = 4) %>% set_mode("regression") rf_model <- rand_forest(trees = 1000, mtry = 3) %>% set_mode("regression") Ϟσϧʹݻ༗ͷ ɹɹΦϓγϣϯΛࢦఆՄೳ rf_model %>% set_engine("ranger") rf_model %>% set_engine("randomForest") ϥϯμϜϑΥϨετ ޯ഑ϒʔεςΟϯά gb_model %>% set_engine("xgboost") 2 1 3 fit() predict() ద༻͢ΔϞσϧɺΤϯδϯΛมߋ

Slide 38

Slide 38 text

{yardstick} ϞσϧͷੑೳධՁͷࢦඪΛ࡞੒

Slide 39

Slide 39 text

ϞσϧͷੑೳධՁ λεΫʹԠͨ͡ධՁࢦඪΛར༻͢Δ ܾఆ܎਺(R2, RSQ: coefficient of determination) ೋ৐ฏۉฏํࠜޡࠩ (RMSE: Root Mean Square Error) ฏۉઈରޡࠩ(MAE: Mean absolute error) ࠞಉߦྻ ਖ਼ղ཰ ద߹཰ͱ࠶ݱ཰ ROCۂઢͱAUC ճؼ໰୊ ෼ྨ໰୊

Slide 40

Slide 40 text

ϞσϧੑೳධՁ ࢓༷Λఆٛ ΤϯδϯʢύοέʔδʣΛࢦఆ Ϟσϧͷ౰ͯ͸Ί 1 set_engine() ՝୊ʹదͨ͠ϞσϧΛબͿ 2 3 fit() linear_reg() rand_forest() logistic_reg() ֶशηοτͷద༻ predict() ධՁηοτͰͷ༧ଌ

Slide 41

Slide 41 text

ϞσϧੑೳධՁ ಛఆͷੑೳࢦඪ΍ੑೳࢦඪͷ૊Έ߹ΘͤΛࢦఆ͢Δ rmse(df_lm_model_predict, truth = log_lp, estimate = .pred) #> .metric .estimator .estimate #> 1 rmse standard 0.357 rsq(df_lm_model_predict, truth = log_lp, estimate = .pred) #> .metric .estimator .estimate #> 1 rsq standard 0.595 ઢܕճؼϞσϧͷ3.4&͸ lp_metrics <- metric_set(rmse, rsq, mae) lp_metrics(df_lm_model_predict, truth = log_lp, estimate = .pred) #> # A tibble: 3 x 3 #> .metric .estimator .estimate #> #> 1 rmse standard 0.357 #> 2 rsq standard 0.595 #> 3 mae standard 0.271

Slide 42

Slide 42 text

ggplotϕʔεͷՄࢹԽ ෼ྨ໰୊Λѻ͏Ϟσϧͷ৔߹ 30$ۂઢ $POGVTJPO.BUSJY autoplot()

Slide 43

Slide 43 text

{workflows} ಛ௃ྔΤϯδχΞϦϯάɺϞσϧద༻Λ ϫʔΫϑϩʔԽ

Slide 44

Slide 44 text

ϨγϐͱϞσϧͷ૊Έ߹Θͤ ࡞ۀ߲໨ ܾఆࣄ߲ {parsnip} {recipes} {rsample} ࣮ߦΤϯδϯͷࢦఆͱ౰ͯ͸Ί ճؼ໰୊ ઢܗճؼϞσϧ ϥϯμϜϑΥϨετ ޯ഑ϒʔεςΟϯά Ϟσϧͷछྨ {parsnip} ಛ௃ྔΤϯδχΞϦϯά σʔλ෼ׂ

Slide 45

Slide 45 text

ϫʔΫϑϩʔʹམͱ͠ࠐΉ ϫʔΫϑϩʔͷએݴ σʔληοτʹର͢Δॲཧ Ϟσϧͷࢦఆ 1 add_model() 2 3 add_recipe() workflow()

Slide 46

Slide 46 text

஍Ձެࣔσʔλճؼ໰୊ͷϫʔΫϑϩʔ lp_wflow <- workflow() %>% add_recipe(init_lp_recipe) %>% add_model(lm_model) # ઢܗճؼϞσϧ fit(lp_wflow, data = lp_train) Ϩγϐ΍Ϟσϧͷมߋɺద༻͢Δσʔλͷࢦఆ͕༰қ lp_wflow %>% update_model(rf_model) %>% fit(data = lp_train) lp_wflow %>% update_model(gb_model) %>% fit(data = lp_test) \QBSTOJQ^Ͱ࡞ͬͨ ϥϯμϜϑΥϨετ ධՁηοτ 1 2 \YHCPPTU^Ͱ࡞ͬͨ ޯ഑ϒʔεςΟϯά 2

Slide 47

Slide 47 text

Ϟσϧͷվળͱӡ༻

Slide 48

Slide 48 text

ઌͷઢܕճؼϞσϧͰ͸3.4&͕ʜ ϥϯμϜϑΥϨετɺޯ഑ϒʔεςΟϯάͷ݁Ռ͸ ϞσϧͷվળݟࠐΈ ަޓ࡞༻߲ͷޮՌ͸ʁ Ң౓ɾܦ౓ͷӨڹ͸ʁ ϋΠύʔύϥϝʔλͷ୳ࡧ΋΍͍ͬͯͳ͍ ಛ௃ྔΤϯδχΞϦϯάͷҰ޻෉

Slide 49

Slide 49 text

ަޓ࡞༻߲ CVJMEJOH@TUSVDUVSFͷҧ͍ʹΑͬͯ஍ੵͱՁ͕֨ҟͳΔ

Slide 50

Slide 50 text

ඇઢܗͷৼΔ෣͍Λଊ͑Δ Ң౓ͱܦ౓ͷεϓϥΠϯฏ׈Խ ࣍਺ʢϊοτ਺ʣͷબ୒͸

Slide 51

Slide 51 text

ϨγϐɺϫʔΫϑϩʔͷߋ৽ second_lp_recipe <- init_lp_recipe %>% step_interact( ~ acreage:starts_with("building_structure")) %>% # << ަޓ࡞༻߲ step_ns(.latitude, .longitude, deg_free = 20) # << ϊοτ਺͸ద౰ ࠷ॳͷϨγϐʹॲཧΛ௥Ճ lp_wflow <- lp_wflow %>% update_recipe(second_lp_recipe) lp_fit <- fit(lp_wflow, lp_train) 3.4&ʜ 1 2

Slide 52

Slide 52 text

Ϟσϧͷൺֱ ϥϯμϜϑΥϨετͳ͍͠ޯ഑ϒʔεςΟϯά͕ྑͦ͞͏

Slide 53

Slide 53 text

{dials},{tune} ύϥϝʔλ୳ࡧͱϞσϧௐ੔ {resample} όϦσʔγϣϯηοτͷ࡞੒

Slide 54

Slide 54 text

ϋΠύʔύϥϝʔλͷ୳ࡧ ྫ͑͹ɺϥϯμϜϑΥϨετϞσϧͰ͸ ͭͷϋΠύʔύϥϝʔλͷࢦఆ͕Մೳ ϋΠύʔύϥϝʔλͷ஋͕Ϟσϧͷਫ਼౓ʹӨڹ͢Δ rand_forest(mtry, trees, min_n) ύϥϝʔλͷ஋ΛมԽͤͨ͞ঢ়ଶͰͷੑೳධՁ͕ඞཁ ֶशηοτ͔ΒόϦσʔγϣϯηοτΛ༻ҙ࣮ͯ͠ࢪ ܾఆ໦ʹ༩͑Δ ಛ௃ྔ਺ ࡞੒͢Δ ܾఆ໦ͷ਺ ϊʔυதͷ ࠷খαϯϓϧ਺

Slide 55

Slide 55 text

όϦσʔγϣϯηοτͷ࡞੒ σʔλɺ໨తʹԠͯ͡มߋ͢Δ ֶशηοτ ධՁηοτ σʔληοτ ֶशηοτΛ෼ׂ ࢖Θͳ͍

Slide 56

Slide 56 text

set.seed(55) val_set <- vfold_cv(lp_train, v = 10) #> # 10-fold cross-validation #> # A tibble: 10 x 2 #> splits id #> #> 1 Fold01 #> 2 Fold02 #> 3 Fold03 #> 4 Fold04 #> 5 Fold05 #> 6 Fold06 #> 7 Fold07 #> 8 Fold08 #> 9 Fold09 #> 10 Fold10 cores <- parallel::detectCores() rf_wflow <- workflow() %>% add_model( rand_forest(mtry = tune(), trees = tune()) %>% set_engine("ranger", num.threads = cores) %>% set_mode("regression")) %>% add_recipe(second_lp_recipe) ϥϯμϜϑΥϨετͷύϥϝʔλ୳ࡧ ௐ੔͍ͨ͠ύϥϝʔλ ʹରͯ͠UVOF Λࢦఆ ,෼ׂަࠩݕূͰ ݸͷGPMEΛ༻ҙ 1 2

Slide 57

Slide 57 text

άϦουαʔνͰͷ୳ࡧ set.seed(345) rf_res <- rf_wflow %>% tune_grid(val_set, grid = 25, control = control_grid(save_pred = TRUE), metrics = metric_set(rmse)) autoplot(rf_res) όϦσʔγϣϯηοτ ධՁࢦඪͷࢦఆ 3

Slide 58

Slide 58 text

ϕετϞσϧͷύϥϝʔλ rf_best <- rf_res %>% show_best(metric = “rmse") #> # A tibble: 5 x 8 #> mtry trees .metric .estimator mean n std_err .config #> #> 1 26 1283 rmse standard 0.135 1 NA Model17 #> 2 20 524 rmse standard 0.135 1 NA Model19 #> 3 23 1190 rmse standard 0.135 1 NA Model08 #> 4 29 239 rmse standard 0.135 1 NA Model01 #> 5 34 1118 rmse standard 0.136 1 NA Model25 4

Slide 59

Slide 59 text

Ϟσϧௐ੔ ࠷దͳϊοτ਺ͷબ୒ ݱࡏͷϨγϐͰ͸ TUFQ@OT EFH@GSFF ͱܾΊଧͪɻ͜ͷ஋΋୳ࡧͯ͠࠷దԽ͢Δ

Slide 60

Slide 60 text

೚ҙͷύϥϝʔλɾൣғΛ୳ࡧ #> # A tibble: 4 x 7 #> coords .metric .estimator mean n std_err .config #> #> 1 5 rmse standard 0.160 10 0.00288 Recipe2 #> … tune_lp_recipe <- init_lp_recipe %>% step_interact( ~ acreage:starts_with("building_structure")) %>% step_ns(.latitude, .longitude, deg_free = tune("coords")) # << spline_res <- tune_grid(rf_model, tune_lp_recipe, resamples = lp_folds, grid = expand.grid(coords = c(2, 5, 20, 200))) spline_res %>% show_best(metric = "rmse")

Slide 61

Slide 61 text

࠷ऴతͳϞσϧ rf_model_tuned <- rand_forest(mtry = rf_best$mtry[1], trees = rf_best$trees[1]) %>% set_engine("ranger", num.threads = cores, importance = "impurity") %>% set_mode("regression") ϥϯμϜϑΥϨετͷύϥϝʔλʹ୳ࡧͨ݁͠ՌΛద༻ ϕετϞσϧͷύϥϝʔλ last_lp_recipe <- init_lp_recipe %>% step_interact( ~ acreage:starts_with("building_structure")) %>% step_ns(.latitude, .longitude, deg_free = 5)

Slide 62

Slide 62 text

ֶशɾධՁηοτΛ༩͑ͯ࠷ऴ݁ՌΛಘΔ last_rf_fit <- rf_wflow_tuned %>% last_fit(lp_split) last_rf_fit %>% collect_metrics() #> # A tibble: 2 x 3 #> .metric .estimator .estimate #> #> 1 rmse standard 0.134 #> 2 rsq standard 0.943 3.4&ʜ ࠷ॳʹ෼ׂͨ͠ σʔληοτ ॳظϞσϧʜ rf_wflow_tuned <- rf_wflow %>% update_recipe(last_lp_recipe) %>% update_model(rf_model_tuned) ௐ੔ͨ͠Ϩγϐɺ ϞσϧΛࢦఆ

Slide 63

Slide 63 text

·ͱΊ {tidymodels} ͸ ౷ҰతΠϯλʔϑΣΠεΛఏڙ͢Δɻ ౷ܭϞσϧɾػցֶशʹඞཁͳॲཧΛ؆ུԽ͢Δ

Slide 64

Slide 64 text

·ͱΊ {parsnip} {recipes} {rsample} {yardstick} Ϟσϧߏஙɾద༻ ϞσϧͷੑೳධՁ σʔλલॲཧɺಛ௃ྔੜ੒ ෼ׂɺϦαϯϓϦϯά {dials} {tune} {workflows} ϫʔΫϑϩʔԽ ύϥϝʔλ୳ࡧɾௐ੔ initial_split() vfold_cv() initial_time_split() nested_cv() step_*() prep() bake() recipe() set_engine() rand_forest() boost_tree() metrics() rmse() roc_auc() tune() grid_random() workflow() add_*() update_*()