Upgrade to Pro — share decks privately, control downloads, hide ads and more …

tidymodelsで覚えるRでのモデル構築と運用 / tidymodels2020

Uryu Shinya
October 21, 2020

tidymodelsで覚えるRでのモデル構築と運用 / tidymodels2020

Uryu Shinya

October 21, 2020
Tweet

More Decks by Uryu Shinya

Other Decks in Programming

Transcript

  1. Uryu Shinya @u_ribo
    tidymodelsͰ֮͑Δ
    RͰͷϞσϧߏஙͱӡ༻

    View Slide

  2. ໨࣍
    UJEZNPEFMTʹΑΔϞσϧߏங
    Ϟσϧͷվળͱӡ༻
    σʔλϞσϦϯάͷϫʔΫϑϩʔ
    1
    2
    3

    View Slide

  3. σʔλϞσϦϯάͷ
    ϫʔΫϑϩʔ

    View Slide

  4. Garrett and Hadley (2016)ΑΓ࡞੒
    tidymodels͸͜͜
    σʔλ෼ੳͷϫʔΫϑϩʔ

    View Slide

  5. ൓෮తͳ࡞ۀͰϞσϧΛຏ্͖͍͛ͯ͘
    Ϟσϧߏஙͷయܕతͳ࿮૊Έ
    Max and Kjell (2019)ΑΓ࡞੒

    View Slide

  6. B
    σʔλͷಛ௃ɺσʔλؒͷؔ܎Λ஌ΓɺॳظϞσϧʹར༻͢
    Δʮग़ൃ఺ʯΛݟ͚ͭΔͨΊͷࢹ֮Խɻ
    C
    ౷ܭྔͷूܭ΍໨తม਺ͱڧ͍૬ؔͷ͋Δม਺ΛಛఆɺϞσ
    ϧʹର͢ΔԾઆΛཱͯΔɻσʔλΛे෼ʹཧղͰ͖ͨͱݴ͑
    Δ·Ͱɺؔ܎ΛՄࢹԽ͠ɺ͞ΒͳΔఆྔ෼ੳΛ܁Γฦ͢ɻ
    D
    σʔλΛॳظϞσϧʹద༻͢ΔͨΊͷ४උɻ
    ͍͟ɺϞσϦϯάͷ࣮ߦʂͱ͸͍͔ͳ͍

    View Slide

  7. E
    ॳظϞσϧͷ࣮ߦɻॳظϞσϧʹར༻ͨ͠σʔλͰɺɹɹɹ
    ͍͔ͭ͘ͷϞσϧ΋ద༻ɺൺֱɻɹɹɹɹɹɹɹɹɹɹɹɹ
    ϋΠύʔύϥϝʔλͷ୳ࡧ΋͜͜ͰߦΘΕΔɻ
    F
    ෳ਺ճߦΘΕͨύϥϝʔλௐ੔ͷ݁ՌΛ෼ੳ
    G
    Ϟσϧͷ݁ՌΛՄࢹԽ
    ෳ਺ͷϞσϧͰͷੑೳΛൺֱ͢Δ

    View Slide

  8. H
    ॳظϞσϧΛվྑ͢Δಛ௃ྔΤϯδχΞϦϯά
    I
    ࠷ऴతͳީิϞσϧʹର͢Δௐ੔
    J
    ධՁηοτΛར༻ͨ͠൚ԽੑೳͷධՁ
    K
    ӡ༻
    ϞσϧΛվળ͢Δಛ௃ྔΛ୳͢

    View Slide

  9. ෳ਺ͷ޻ఔΛ൓෮తʹߦ͏
    ࡞ۀ߲໨
    ܾఆࣄ߲
    ධՁࢦඪ
    ಛ௃ྔΤϯδχΞϦϯά
    Ϟσϧͷܾఆɾ࣮ߦɾൺֱ
    ൚ԽੑೳͷධՁɾվળ
    λεΫઃఆ
    ύϥϝʔλ୳ࡧ
    ୳ࡧతσʔλ෼ੳ
    ճؼ ෼ྨ
    ܽଛ΁ͷରॲ
    ࡟আ ิ׬
    3.4& "6$
    ༏ઌ౓
    ద߹཰
    ൚Խੑೳ ղऍੑ ܰྔ
    όϦσʔγϣϯ ελοΩϯά
    άϦουαʔν ϕΠζ࠷దԽ
    ϥϯμϜϑΥϨετ (#%5 χϡʔϥϧωοτ
    εέʔϦϯά ΤϯίʔσΟϯά 1$"

    View Slide

  10. ͰσʔλϞσϦϯάΛߦ͏ࡍͷ՝୊
    ଟ͘ͷύοέʔδ͕։ൃ͞Ε͍ͯΔ͕
    ΠϯλʔϑΣΠεʹ౷Ұੑ͕ͳ͍
    ϞσϧΦϒδΣΫτΛѻ͏
    formulaͷಠࣗੑ
    5IF3'PSNVMB.FUIPE5IF(PPE1BSUTu37JFXT
    IUUQTSWJFXTSTUVEJPDPNUIFSGPSNVMBNFUIPEUIFHPPEQBSUT
    5IF3'PSNVMB.FUIPE5IF#BE1BSUTu37JFXT
    IUUQTSWJFXTSTUVEJPDPNUIFSGPSNVMBNFUIPEUIFCBEQBSUT
    ύοέʔδ
    ܾఆ໦ʹ༩͑Δ
    ಛ௃ྔ਺
    ࡞੒͢Δ
    ܾఆ໦ͷ਺
    ϊʔυதͷ
    ࠷খαϯϓϧ਺
    ranger mtry num.trees min.node.size
    randomForest mtry ntree nodesize
    sparklyr mtry num.trees min_instances_per_node

    View Slide

  11. {tidymodels}
    ౷ܭղੳɺϞσϦϯάͷͨΊͷύοέʔδ܈

    View Slide

  12. UJEZNPEFMTW
    library(tidymodels)
    ✓ broom 0.7.1 ✓ recipes 0.1.13
    ✓ dials 0.0.9 ✓ rsample 0.0.8
    ✓ dplyr 1.0.2 ✓ tibble 3.0.4
    ✓ infer 0.5.3 ✓ tidyr 1.1.2
    ✓ modeldata 0.0.2 ✓ tune 0.1.1
    ✓ parsnip 0.1.3 ✓ workflows 0.2.1
    ✓ purrr 0.3.4 ✓ yardstick 0.0.7
    UJEZWFSTFύοέʔδͱಉ͡఩ֶࢥ૝Ͱ։ൃ͞ΕΔ
    ҰͭͷύοέʔδΛ
    ಡΈࠐΉͱ
    ෳ਺ͷύοέʔδ͕
    ར༻ՄೳʹͳΔ
    ౷Ұ͞ΕͨΠϯλʔϑΣΠεΛఏڙ
    ύΠϓԋࢉࢠ

    ϑϨϯυϦʔ
    ؔ਺ɺ
    Ҿ਺໊ͷ໌֬ੑ

    View Slide

  13. {tidymodels}ʹؚ·ΕΔύοέʔδ
    {parsnip}
    {recipes}
    {rsample}
    {yardstick}
    Ϟσϧߏஙɾద༻
    ϞσϧͷੑೳධՁ
    σʔλલॲཧɺ
    ಛ௃ྔੜ੒
    ෼ׂɺϦαϯϓϦϯά
    {dials}
    {tune}
    {workflows}
    Ϟσϧద༻·ͰͷॲཧΛ
    ϫʔΫϑϩʔԽ
    ύϥϝʔλ୳ࡧɾௐ੔
    ͜ͷεϥΠυͰѻ͏΋ͷ

    View Slide

  14. UJEZNPEFMTʹΑΔ
    Ϟσϧߏங

    View Slide

  15. ԋश
    ࠃ౔਺஋৘ใ஍Ձެࣔσʔλ
    ஍ՁՁ֨Λ༧ଌ͢ΔϞσϧΛߏங͢Δʢճؼ໰୊ʣ
    dplyr ::glimpse(df_lp)
    #> Rows: 8,476
    #> Columns: 8
    #> $ log_lp 3.618048, 4.591065, 4.754348…
    #> $ distance_from_station 8700, 13000, 13000, 5500, 80…
    #> $ acreage 317, 166, 226, 274, 357, 173, 661…
    #> $ current_use "ॅ୐,ͦͷଞ", "ॅ୐", "ళฮ"…
    #> $ building_coverage 0, 60, 80, 70, 70, 60, 70, 70, 70…
    #> $ building_structure W, W, W, W, W, W, W, W, W, W, W…
    #> $ .longitude 138.5383, 138.5921, 138.5933…
    #> $ .latitude 36.46920, 36.61913, 36.62025…
    出典: 国⼟交通省 国⼟数値情報
    地価公⽰データ 第2.4版 L01 平成30年度
    https://nlftp.mlit.go.jp/ksj/jpgis/datalist/KsjTmplt-L01-v1_1.html

    View Slide

  16. ม਺໊ આ໌ ܕ
    log_lp ஍ՁՁ֨Λৗ༻ର਺ʹͨ͠஋ ࣮਺
    distance_from_station Ӻ͔Βͷڑ཭(m) ੔਺
    acreage ஍ੵ(m2) ੔਺
    current_use
    ར༻ݱگɻඪ४஍ͷݱࡏͷར༻ํ๏Λࣔ͢ΧςΰϦɻ
    ෳ਺ͷΧςΰϦʹͳΔ͜ͱ΋͋Δɻ
    Ҽࢠ
    building_coverage ݐ΃͍཰ɻݐங෺ͷԆ΂໘ੵͷෑ஍໘ੵʹର͢Δׂ߹ ࣮਺
    building_structure
    ݐ෺ߏ଄ɻඪ४஍ͷݐ෺ͷߏ଄ʹΑΔ۠ผɻ
    SRCɿమࠎɾమےίϯΫϦʔτ, RCɿమےίϯΫϦʔτ,
    Sɿమࠎ଄, BɿϒϩοΫ଄, Wɿ໦଄ɻະهࡌͷ৔߹͸ UNKNOWN
    Ҽࢠ
    .longitude ܦ౓ɻ஍Ձެࣔඪ४஍ͷҐஔΛࣔ͢ ࣮਺
    .latitude Ң౓ɻ஍Ձެࣔඪ४஍ͷҐஔΛࣔ͢ ࣮਺
    ԋश
    ࠃ౔਺஋৘ใ஍Ձެࣔσʔλ

    View Slide

  17. ஍ՁՁ֨ͷର਺Խ
    ֎Ε஋
    ෼ࢄ͕҆ఆ
    ߴՁ֨ͷ஍ՁͷӨڹΛड͚Δ
    ϚΠφεͷ஋ʹͳΒͳ͍

    View Slide

  18. {rsample}
    σʔληοτͷ෼ׂɺϦαϯϓϦϯά

    View Slide

  19. σʔληοτશମΛ
    ֶशηοτ USBJO
    ɺධՁηοτ UFTU
    ʹ෼͚Δ
    ֶशηοτ
    ධՁηοτ
    σʔλ෼ׂ
    3Ͱͷφ΢ͳσʔλ෼ׂͷ΍ΓํSTBNQMFύοέʔδʹΑΔަࠩݕূגࣜձࣾϗΫιΤϜͷϒϩά
    IUUQTCMPHIPYPNDPNFOUSZ
    σʔληοτ
    Ϟσϧͷֶशʹ༻͍Δ ϞσϧͷੑೳධՁΛଌఆ͢ΔͨΊɺ
    ະ஌ͷ৘ใͱͯ͠༩͑Δ

    View Slide

  20. ෼ׂ͸ϥϯμϜ
    σʔληοτͷׂΛ෼ੳηοτͱ͢Δ
    lp_split <- initial_split(df_lp,
    prop = 0.8,
    strata = log_lp)
    lp_split
    #>
    #> <6358/2118/8476>
    lp_train <- training(lp_split) # ֶशηοτ
    lp_test <- testing(lp_split)ɹ# ධՁηοτ
    σʔλ෼ׂ
    ஍Ձͷ෼෍ʹԠͨ͡
    ૚ผαϯϓϦϯάΛࢦఆ

    View Slide

  21. σʔλ෼ׂ
    ஍ՁՁ֨ͷ૚ผαϯϓϦϯά
    ࢛෼Ґ఺͝ͱʹσʔλΛ۠੾Δʢ૚ʣ
    ˠ૚͝ͱʹϦαϯϓϦϯάΛ࣮ࢪʢܭճʣ
    ֶशͱධՁηοτؒͰภΓͳ͘ɺ
    ෼෍Λྨࣅͤ͞ΔͨΊͷख๏
    ෼ྨ໰୊ʹ͓͍ͯɺϥϕϧ਺ͷ
    ෆۉߧ͕ੜ͡Δ৔߹ʹ΋༗ޮ
    ֶश
    ධՁ




    View Slide

  22. {recipes}
    σʔλΛϞσϧʹద༻͢ΔͨΊͷલॲཧɺ
    ಛ௃ྔΤϯδχΞϦϯά

    View Slide

  23. લॲཧɾಛ௃ྔΤϯδχΞϦϯά
    Ϟσϧʹ༻͍ΔσʔλՃ޻ͷखଓ͖ΛʮϨγϐʯԽ
    ϞσϧͰѻ͏σʔλͷલॲཧΛSFDJQFTͰߦ͏גࣜձࣾϗΫιΤϜͷϒϩά
    IUUQTCMPHIPYPNDPNFOUSZ
    1
    2
    3
    recipe()
    step_*()
    prep()
    bake()
    4
    ར༻͢Δม਺ͷؔ܎Λఆٛ
    ˠࡐྉΛࢦఆ͢Δ
    σʔλՃ޻ͷखଓ͖Λࢦఆ
    ˠௐཧ๏Λهड़͢Δ
    σʔληοτʹద༻
    ˠௐཧΛߦ͏
    TUFQ@
    ͷॲཧΛ౷߹
    ˠϨγϐΛ֬ೝ͢Δ

    View Slide

  24. init_lp_recipe <-
    lp_train %>%
    #> # log_lp Λ໨తม਺ɺଞͷม਺Λઆ໌ม਺ʹͨ͠Ϟσϧ
    recipe(formula = log_lp ~ .) %>%
    #> # εςοϓ1: acreageΛର৅ʹৗ༻ର਺ʹม׵
    step_log(acreage, base = 10)
    Ϟσϧ΁ͷॲཧΛύΠϓԋࢉࢠͰ௥Ճ
    step_log(
    recipe(lp_train, log_lp ~ .),
    acreage, base = 10)
    ౰વɺؔ਺ΛೖΕࢠʹهड़ͯ͠΋0,

    View Slide

  25. ͲΜͳॲཧΛࢦఆͰ͖Δͷʁ
    step_*()
    ؔ਺͸
    ͱͯ͠ఏڙ͞ΕΔ
    εέʔϦϯά
    ΤϯίʔσΟϯά
    ೔෇ɾ࣌ؒ
    ϑΟϧλॲཧ
    ࣍ݩ࡟ݮ ͳͲ
    ls("package:recipes", pattern = “^step_")
    #> # 77ݸͷstep_*ؔ਺ (version 0.1.14)
    ઐ໳ʹಛԽͨ͠ύοέʔδ΋
    {textrecipes} จࣈྻ
    {embed}
    {themis} ෆۉߧ
    ΧςΰϦΧϧ

    View Slide

  26. ৄ͘͠͸ͪ͜Β
    https://uribo.github.io/dpp-cookbook/
    http://bit.ly/slide-fe-recipes
    http://bit.ly/practical-ds

    View Slide

  27. step_*()Ͱͷม਺ͷࢦఆํ๏
    จࣈྻͰͷࢦఆ
    tidyselectͷؔ਺
    Ϟσϧ಺Ͱͷrole
    1
    2
    3
    ม਺ͷσʔλܕ
    4
    all_predictors() all_outcomes()
    starts_with() contains()ͳͲ
    all_nominal() all_numeric()
    "acreage" "building_structure"
    dͰ࢝·Δ dΛؚΉ
    આ໌ม਺ ໨తม਺
    ΧςΰϦ ਺஋

    View Slide

  28. step_*()ͷ௥Ճ
    ม਺DVSSFOU@VTFͷ߲໨͕ଟա͗Δ
    શମͷະຬͷ߲໨͸͢΂ͯzPUIFSzͱ͢Δ
    1
    2
    EJTUBODF@GSPN@TUBUJPO΋ର਺ม׵͍ͨ͠
    ͨͩ͠ɺڑ཭ͷ৔߹ʹ͸*OpOJUZʹͳΒͳ͍Α͏ɺMPH

    3
    4
    ΧςΰϦม਺Λμϛʔม਺Խ
    ͢΂ͯͷม਺Λฏۉɺ෼ࢄͷඪ४Խ

    View Slide

  29. init_lp_recipe <-
    init_lp_recipe %>%
    step_mutate(distance_from_station =
    if_else(distance_from_station == 0,
    0.1,
    as.double(distance_from_station))) %>%
    step_log(distance_from_station, base = 10) %>%
    step_other(current_use, threshold = 0.01) %>%
    step_dummy(all_nominal()) %>%
    step_normalize(all_predictors())
    step_*()ͷ௥Ճ
    1
    2
    3
    4

    View Slide

  30. લॲཧɾಛ௃ྔΤϯδχΞϦϯά
    Ϟσϧʹ༻͍ΔσʔλՃ޻ͷखଓ͖ΛʮϨγϐʯԽ
    ϞσϧͰѻ͏σʔλͷલॲཧΛSFDJQFTͰߦ͏גࣜձࣾϗΫιΤϜͷϒϩά
    IUUQTCMPHIPYPNDPNFOUSZ
    1
    2
    3
    recipe()
    step_*()
    prep()
    bake()
    4
    ར༻͢Δม਺ͷؔ܎Λఆٛ
    ˠࡐྉΛࢦఆ͢Δ
    σʔλՃ޻ͷखଓ͖Λࢦఆ
    ˠௐཧ๏Λهड़͢Δ
    σʔληοτʹద༻
    ˠௐཧΛߦ͏
    TUFQ@
    ͷॲཧΛ౷߹
    ˠϨγϐΛ֬ೝ͢Δ

    View Slide

  31. lp_rec_prepped <-
    prep(init_lp_recipe)
    #> Data Recipe
    #>
    #> Inputs:
    #> role #variables
    #> outcome 1
    #> predictor 7
    #>
    #> Training data contained 6358 data points and no missing data.
    #>
    #> Operations:
    #> Log transformation on acreage [trained]
    #> Variable mutation for distance_from_station [trained]
    #> Log transformation on distance_from_station [trained]
    #> Collapsing factor levels for current_use [trained]
    #> Dummy variables from current_use, building_structure [trained]
    #> Centering and scaling for distance_from_station, acreage, ... [trained]
    recipeͷ׬੒

    View Slide

  32. σʔληοτʹϨγϐΛద༻
    lp_test_prepped <-
    lp_rec_prepped %>%
    bake(new_data = lp_test)
    ෼ੳηοτ ධՁηοτ
    lp_train_prepped <-
    lp_rec_prepped %>%
    bake(new_data = NULL)
    glimpse(lp_train_prepped)
    #> Observations: 6,358
    #> Variables: 22
    #> $ distance_from_station 1.48883723, 1.74347636, …
    #> $ acreage 0.348700377, -0.368317326, -0.026333976, …
    #> …
    #> $ current_use_ॅ୐.ళฮ -0.2244556, -0.2244556, -0.2244556, …
    #> …
    #> $ current_use_other -0.2876676, -0.2876676, -0.2876676, …

    View Slide

  33. {parsnip}
    Ϟσϧͷ࡞੒
    ଟ༷ͳϞσϦϯάύοέʔδΛѻ͏

    View Slide

  34. Ϟσϧߏங
    ࢓༷Λఆٛ
    ΤϯδϯʢύοέʔδʣΛࢦఆ
    Ϟσϧͷ౰ͯ͸Ί
    1
    set_engine()
    ՝୊ʹదͨ͠ϞσϧΛબͿ
    2
    3 fit()
    linear_reg() rand_forest()
    logistic_reg()
    ֶशηοτͷద༻
    predict()
    ධՁηοτͰͷ༧ଌ

    View Slide

  35. ઢܕճؼϞσϧ
    lm_model <-
    linear_reg() %>%
    set_engine("lm")
    class(lm_model)
    #> [1] "linear_reg" "model_spec"
    lm_formula_fit <-
    lm_model %>%
    fit(log_lp ~ ., data = lp_train_prepped)
    lm(log_lp ~ ., data = lp_train_prepped)
    ☝ಉ݁͡Ռ

    View Slide

  36. ઢܕճؼϞσϧ
    df_lm_model_predict <-
    lp_test_prepped %>%
    select(log_lp) %>%
    bind_cols(
    predict(
    lm_formula_fit,
    new_data = lp_test_prepped))

    View Slide

  37. gb_model <-
    boost_tree(trees = 1000,
    mtry = 3,
    tree_depth = 4) %>%
    set_mode("regression")
    rf_model <-
    rand_forest(trees = 1000,
    mtry = 3) %>%
    set_mode("regression")
    Ϟσϧʹݻ༗ͷ
    ɹɹΦϓγϣϯΛࢦఆՄೳ
    rf_model %>%
    set_engine("ranger")
    rf_model %>%
    set_engine("randomForest")
    ϥϯμϜϑΥϨετ
    ޯ഑ϒʔεςΟϯά
    gb_model %>%
    set_engine("xgboost")
    2
    1 3 fit()
    predict()
    ద༻͢ΔϞσϧɺΤϯδϯΛมߋ

    View Slide

  38. {yardstick}
    ϞσϧͷੑೳධՁͷࢦඪΛ࡞੒

    View Slide

  39. ϞσϧͷੑೳධՁ
    λεΫʹԠͨ͡ධՁࢦඪΛར༻͢Δ
    ܾఆ܎਺(R2, RSQ: coefficient of determination)
    ೋ৐ฏۉฏํࠜޡࠩ
    (RMSE: Root Mean Square Error)
    ฏۉઈରޡࠩ(MAE: Mean absolute error)
    ࠞಉߦྻ
    ਖ਼ղ཰
    ద߹཰ͱ࠶ݱ཰ ROCۂઢͱAUC
    ճؼ໰୊
    ෼ྨ໰୊

    View Slide

  40. ϞσϧੑೳධՁ
    ࢓༷Λఆٛ
    ΤϯδϯʢύοέʔδʣΛࢦఆ
    Ϟσϧͷ౰ͯ͸Ί
    1
    set_engine()
    ՝୊ʹదͨ͠ϞσϧΛબͿ
    2
    3 fit()
    linear_reg() rand_forest()
    logistic_reg()
    ֶशηοτͷద༻
    predict()
    ධՁηοτͰͷ༧ଌ

    View Slide

  41. ϞσϧੑೳධՁ
    ಛఆͷੑೳࢦඪ΍ੑೳࢦඪͷ૊Έ߹ΘͤΛࢦఆ͢Δ
    rmse(df_lm_model_predict,
    truth = log_lp,
    estimate = .pred)
    #> .metric .estimator .estimate
    #> 1 rmse standard 0.357
    rsq(df_lm_model_predict,
    truth = log_lp,
    estimate = .pred)
    #> .metric .estimator .estimate
    #> 1 rsq standard 0.595
    ઢܕճؼϞσϧͷ3.4&͸
    lp_metrics <-
    metric_set(rmse, rsq, mae)
    lp_metrics(df_lm_model_predict,
    truth = log_lp,
    estimate = .pred)
    #> # A tibble: 3 x 3
    #> .metric .estimator .estimate
    #>
    #> 1 rmse standard 0.357
    #> 2 rsq standard 0.595
    #> 3 mae standard 0.271

    View Slide

  42. ggplotϕʔεͷՄࢹԽ
    ෼ྨ໰୊Λѻ͏Ϟσϧͷ৔߹
    30$ۂઢ
    $POGVTJPO.BUSJY
    autoplot()

    View Slide

  43. {workflows}
    ಛ௃ྔΤϯδχΞϦϯάɺϞσϧద༻Λ
    ϫʔΫϑϩʔԽ

    View Slide

  44. ϨγϐͱϞσϧͷ૊Έ߹Θͤ
    ࡞ۀ߲໨
    ܾఆࣄ߲
    {parsnip}
    {recipes}
    {rsample}
    ࣮ߦΤϯδϯͷࢦఆͱ౰ͯ͸Ί
    ճؼ໰୊
    ઢܗճؼϞσϧ
    ϥϯμϜϑΥϨετ
    ޯ഑ϒʔεςΟϯά
    Ϟσϧͷछྨ
    {parsnip}
    ಛ௃ྔΤϯδχΞϦϯά
    σʔλ෼ׂ

    View Slide

  45. ϫʔΫϑϩʔʹམͱ͠ࠐΉ
    ϫʔΫϑϩʔͷએݴ
    σʔληοτʹର͢Δॲཧ
    Ϟσϧͷࢦఆ
    1
    add_model()
    2
    3
    add_recipe()
    workflow()

    View Slide

  46. ஍Ձެࣔσʔλճؼ໰୊ͷϫʔΫϑϩʔ
    lp_wflow <-
    workflow() %>%
    add_recipe(init_lp_recipe) %>%
    add_model(lm_model) # ઢܗճؼϞσϧ
    fit(lp_wflow, data = lp_train)
    Ϩγϐ΍Ϟσϧͷมߋɺద༻͢Δσʔλͷࢦఆ͕༰қ
    lp_wflow %>%
    update_model(rf_model) %>%
    fit(data = lp_train)
    lp_wflow %>%
    update_model(gb_model) %>%
    fit(data = lp_test)
    \QBSTOJQ^Ͱ࡞ͬͨ
    ϥϯμϜϑΥϨετ
    ධՁηοτ
    1
    2
    \YHCPPTU^Ͱ࡞ͬͨ
    ޯ഑ϒʔεςΟϯά
    2

    View Slide

  47. Ϟσϧͷվળͱӡ༻

    View Slide

  48. ઌͷઢܕճؼϞσϧͰ͸3.4&͕ʜ
    ϥϯμϜϑΥϨετɺޯ഑ϒʔεςΟϯάͷ݁Ռ͸
    ϞσϧͷվળݟࠐΈ
    ަޓ࡞༻߲ͷޮՌ͸ʁ Ң౓ɾܦ౓ͷӨڹ͸ʁ
    ϋΠύʔύϥϝʔλͷ୳ࡧ΋΍͍ͬͯͳ͍
    ಛ௃ྔΤϯδχΞϦϯάͷҰ޻෉

    View Slide

  49. ަޓ࡞༻߲
    CVJMEJOH@TUSVDUVSFͷҧ͍ʹΑͬͯ஍ੵͱՁ͕֨ҟͳΔ

    View Slide

  50. ඇઢܗͷৼΔ෣͍Λଊ͑Δ
    Ң౓ͱܦ౓ͷεϓϥΠϯฏ׈Խ
    ࣍਺ʢϊοτ਺ʣͷબ୒͸

    View Slide

  51. ϨγϐɺϫʔΫϑϩʔͷߋ৽
    second_lp_recipe <-
    init_lp_recipe %>%
    step_interact(
    ~ acreage:starts_with("building_structure")) %>% # << ަޓ࡞༻߲
    step_ns(.latitude, .longitude, deg_free = 20) # << ϊοτ਺͸ద౰
    ࠷ॳͷϨγϐʹॲཧΛ௥Ճ
    lp_wflow <-
    lp_wflow %>%
    update_recipe(second_lp_recipe)
    lp_fit <-
    fit(lp_wflow, lp_train)
    3.4&ʜ
    1
    2

    View Slide

  52. Ϟσϧͷൺֱ




    ϥϯμϜϑΥϨετͳ͍͠ޯ഑ϒʔεςΟϯά͕ྑͦ͞͏

    View Slide

  53. {dials},{tune}
    ύϥϝʔλ୳ࡧͱϞσϧௐ੔
    {resample}
    όϦσʔγϣϯηοτͷ࡞੒

    View Slide

  54. ϋΠύʔύϥϝʔλͷ୳ࡧ
    ྫ͑͹ɺϥϯμϜϑΥϨετϞσϧͰ͸
    ͭͷϋΠύʔύϥϝʔλͷࢦఆ͕Մೳ
    ϋΠύʔύϥϝʔλͷ஋͕Ϟσϧͷਫ਼౓ʹӨڹ͢Δ
    rand_forest(mtry, trees, min_n)
    ύϥϝʔλͷ஋ΛมԽͤͨ͞ঢ়ଶͰͷੑೳධՁ͕ඞཁ
    ֶशηοτ͔ΒόϦσʔγϣϯηοτΛ༻ҙ࣮ͯ͠ࢪ
    ܾఆ໦ʹ༩͑Δ
    ಛ௃ྔ਺
    ࡞੒͢Δ
    ܾఆ໦ͷ਺
    ϊʔυதͷ
    ࠷খαϯϓϧ਺

    View Slide

  55. όϦσʔγϣϯηοτͷ࡞੒
    σʔλɺ໨తʹԠͯ͡มߋ͢Δ
    ֶशηοτ
    ධՁηοτ
    σʔληοτ
    ֶशηοτΛ෼ׂ
    ࢖Θͳ͍

    View Slide

  56. set.seed(55)
    val_set <-
    vfold_cv(lp_train,
    v = 10)
    #> # 10-fold cross-validation
    #> # A tibble: 10 x 2
    #> splits id
    #>
    #> 1 Fold01
    #> 2 Fold02
    #> 3 Fold03
    #> 4 Fold04
    #> 5 Fold05
    #> 6 Fold06
    #> 7 Fold07
    #> 8 Fold08
    #> 9 Fold09
    #> 10 Fold10
    cores <- parallel::detectCores()
    rf_wflow <-
    workflow() %>%
    add_model(
    rand_forest(mtry = tune(),
    trees = tune()) %>%
    set_engine("ranger",
    num.threads = cores) %>%
    set_mode("regression")) %>%
    add_recipe(second_lp_recipe)
    ϥϯμϜϑΥϨετͷύϥϝʔλ୳ࡧ
    ௐ੔͍ͨ͠ύϥϝʔλ
    ʹରͯ͠UVOF
    Λࢦఆ
    ,෼ׂަࠩݕূͰ
    ݸͷGPMEΛ༻ҙ
    1
    2

    View Slide

  57. άϦουαʔνͰͷ୳ࡧ
    set.seed(345)
    rf_res <-
    rf_wflow %>%
    tune_grid(val_set,
    grid = 25,
    control = control_grid(save_pred = TRUE),
    metrics = metric_set(rmse))
    autoplot(rf_res)
    όϦσʔγϣϯηοτ
    ධՁࢦඪͷࢦఆ
    3

    View Slide

  58. ϕετϞσϧͷύϥϝʔλ
    rf_best <-
    rf_res %>%
    show_best(metric = “rmse")
    #> # A tibble: 5 x 8
    #> mtry trees .metric .estimator mean n std_err .config
    #>
    #> 1 26 1283 rmse standard 0.135 1 NA Model17
    #> 2 20 524 rmse standard 0.135 1 NA Model19
    #> 3 23 1190 rmse standard 0.135 1 NA Model08
    #> 4 29 239 rmse standard 0.135 1 NA Model01
    #> 5 34 1118 rmse standard 0.136 1 NA Model25
    4

    View Slide

  59. Ϟσϧௐ੔
    ࠷దͳϊοτ਺ͷબ୒
    ݱࡏͷϨγϐͰ͸ TUFQ@OT EFH@GSFF

    ͱܾΊଧͪɻ͜ͷ஋΋୳ࡧͯ͠࠷దԽ͢Δ

    View Slide

  60. ೚ҙͷύϥϝʔλɾൣғΛ୳ࡧ
    #> # A tibble: 4 x 7
    #> coords .metric .estimator mean n std_err .config
    #>
    #> 1 5 rmse standard 0.160 10 0.00288 Recipe2
    #> …
    tune_lp_recipe <-
    init_lp_recipe %>%
    step_interact( ~ acreage:starts_with("building_structure")) %>%
    step_ns(.latitude, .longitude, deg_free = tune("coords")) # <<
    spline_res <-
    tune_grid(rf_model,
    tune_lp_recipe,
    resamples = lp_folds,
    grid = expand.grid(coords = c(2, 5, 20, 200)))
    spline_res %>%
    show_best(metric = "rmse")

    View Slide

  61. ࠷ऴతͳϞσϧ
    rf_model_tuned <-
    rand_forest(mtry = rf_best$mtry[1],
    trees = rf_best$trees[1]) %>%
    set_engine("ranger",
    num.threads = cores,
    importance = "impurity") %>%
    set_mode("regression")
    ϥϯμϜϑΥϨετͷύϥϝʔλʹ୳ࡧͨ݁͠ՌΛద༻
    ϕετϞσϧͷύϥϝʔλ
    last_lp_recipe <-
    init_lp_recipe %>%
    step_interact( ~ acreage:starts_with("building_structure")) %>%
    step_ns(.latitude, .longitude, deg_free = 5)

    View Slide

  62. ֶशɾධՁηοτΛ༩͑ͯ࠷ऴ݁ՌΛಘΔ
    last_rf_fit <-
    rf_wflow_tuned %>%
    last_fit(lp_split)
    last_rf_fit %>%
    collect_metrics()
    #> # A tibble: 2 x 3
    #> .metric .estimator .estimate
    #>
    #> 1 rmse standard 0.134
    #> 2 rsq standard 0.943
    3.4&ʜ
    ࠷ॳʹ෼ׂͨ͠
    σʔληοτ
    ॳظϞσϧʜ
    rf_wflow_tuned <-
    rf_wflow %>%
    update_recipe(last_lp_recipe) %>%
    update_model(rf_model_tuned)
    ௐ੔ͨ͠Ϩγϐɺ
    ϞσϧΛࢦఆ

    View Slide

  63. ·ͱΊ
    {tidymodels} ͸
    ౷ҰతΠϯλʔϑΣΠεΛఏڙ͢Δɻ
    ౷ܭϞσϧɾػցֶशʹඞཁͳॲཧΛ؆ུԽ͢Δ

    View Slide

  64. ·ͱΊ
    {parsnip}
    {recipes}
    {rsample} {yardstick}
    Ϟσϧߏஙɾద༻
    ϞσϧͷੑೳධՁ
    σʔλલॲཧɺಛ௃ྔੜ੒
    ෼ׂɺϦαϯϓϦϯά
    {dials}
    {tune}
    {workflows}
    ϫʔΫϑϩʔԽ
    ύϥϝʔλ୳ࡧɾௐ੔
    initial_split() vfold_cv()
    initial_time_split() nested_cv()
    step_*() prep() bake()
    recipe()
    set_engine()
    rand_forest()
    boost_tree()
    metrics() rmse() roc_auc()
    tune() grid_random()
    workflow() add_*() update_*()

    View Slide