Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Applied machine learning with tidymodels
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Julia Silge
June 22, 2022
Technology
170
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Applied machine learning with tidymodels
useR! 2022 keynote
Julia Silge
June 22, 2022
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
380
The right tool for the job
juliasilge
0
90
Good practices for applied machine learning
juliasilge
0
250
Maintaining an R Package
juliasilge
0
440
Publishing the Stack Overflow Developer Survey
juliasilge
2
100
Text Mining: Exploratory Data Analysis to Machine Learning
juliasilge
1
260
Text Mining Using Tidy Data Principles
juliasilge
0
190
North American Developer Hiring Landscape
juliasilge
0
90
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
日本 Fintech 未来予測レポート 2027〜2028年(オリジナル版)
8maki
0
610
SIer20年! 培ったスキルがスタートアップで輝く時
shucho0103
0
810
Databricks における 生成AIガバナンスの実践
taka_aki
1
370
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
560
小さくはじめるSLI/SLO ~育てながら組織に定着させる実践知~ / Starting Small with SLI/SLOs: Building Adoption Through Continuous Growth
nari_ex
2
1.2k
手塩にかけりゃいいってもんじゃない
ming_ayami
0
170
新しいVibe Codingと”自走”について
watany
5
280
Microsoft Build Keynoteふりかえり
tomokusaba
0
120
自律型AIエージェントは何を破壊するのか
kojira
0
140
Dario Amodi『Policy on the AI Exponential』を理解する
nagatsu
0
210
日本 Fintech 未来予測レポート 2027〜2028年(手動編集版)
8maki
0
690
2026TECHFRESH畢業分享會 - 原生還是跨平台? App 開發踩坑實錄
line_developers_tw
PRO
0
600
Featured
See All Featured
Building Applications with DynamoDB
mza
96
7.1k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
770
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
200
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Music & Morning Musume
bryan
47
7.2k
Abbi's Birthday
coloredviolet
2
8k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
How to make the Groovebox
asonas
2
2.2k
Odyssey Design
rkendrick25
PRO
2
690
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.4k
Transcript
A pl ed ac in L ar in w th
t dy od ls J li S lg @j l
H ll @j l
h ://x .c /1 /
I a c : h ://v .c /b /m _l
g/
I a c : h ://v .c /b /m _l
g/
W at's he ar es p rt bo t ac
in l ar in i p ac ic ? @j l
@j l
library(tidymodels) #> ── Attaching packages ────────────────────────────────────────────── tidymodels 0.2.0 ── #>
✔ broom 0.8.0 ✔ rsample 0.1.1 #> ✔ dials 1.0.0 ✔ tibble 3.1.7 #> ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 #> ✔ infer 1.0.2 ✔ tune 0.2.0 #> ✔ modeldata 0.1.1 ✔ workflows 0.2.6 #> ✔ parsnip 1.0.0 ✔ workflowsets 0.2.1 #> ✔ purrr 0.3.4 ✔ yardstick 1.0.0 #> ✔ recipes 0.2.0 #> ── Conflicts ───────────────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> • Dig deeper into tidy modeling with R at https://www.tmwr.org @j l
None
t wr.o g
T re t pi s or od y 4 S
u t b 4 W u m s n e 4 G u m o u l @j l
S en in y ur at b dg t @j
l
r am le h tp ://r am le.t dy od
ls.o g @j l
D ta pl tt ng @j l
initial_split() S t r y t a t n t
g s penguins_split <- initial_split(penguins, prop = 0.75) penguins_split #> <Training/Testing/Total> #> <249/84/333> @j l
training() a d testing() C t g n t t
o rsplit penguins_train <- training(penguins_split) penguins_train #> # A tibble: 249 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Chinst… Dream 47.6 18.3 195 3850 fema… #> 2 Adelie Torge… 35.7 17 189 3350 fema… #> 3 Gentoo Biscoe 45.5 15 220 5000 male #> 4 Gentoo Biscoe 48.7 15.7 208 5350 male #> 5 Gentoo Biscoe 46.5 13.5 210 4550 fema… #> # … with 244 more rows, and 1 more variable: year <int> @j l
training() a d testing() C t g n t t
o rsplit penguins_test <- testing(penguins_split) penguins_test #> # A tibble: 84 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Adelie Torge… 40.3 18 195 3250 fema… #> 2 Adelie Torge… 36.7 19.3 193 3450 fema… #> 3 Adelie Torge… 36.6 17.8 185 3700 fema… #> 4 Adelie Torge… 34.4 18.4 184 3325 fema… #> 5 Adelie Torge… 46 21.5 194 4200 male #> # … with 79 more rows, and 1 more variable: year <int> @j l
T e es in d ta s re io s
! @j l
H w an e se he ra ni g et
o c mp re, e al at , a d un m de s? @j l
@j l
C os -v li at on 14 18 28 17
21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 14 18 28 17 21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 @j l
C os -v li at on Model Fit Using Estimate
Performance Using Fold 1 Iteration Fold 2 Iteration Fold 3 Iteration 14 29 17 20 21 8 24 28 3 1 13 26 16 9 5 30 19 15 6 12 27 22 23 25 2 18 7 4 11 10 11 28 18 22 23 7 25 27 4 2 10 26 16 8 5 29 20 13 6 9 30 19 21 24 1 17 12 3 15 14 14 27 18 21 22 7 23 25 2 1 12 24 17 10 3 30 19 15 4 11 29 20 26 28 5 16 8 6 13 9 @j l
C os -v li at on set.seed(123) vfold_cv(penguins_train, strata =
species) #> # 10-fold cross-validation using stratification #> # A tibble: 10 × 2 #> splits id #> <list> <chr> #> 1 <split [223/26]> Fold01 #> 2 <split [223/26]> Fold02 #> 3 <split [223/26]> Fold03 #> 4 <split [224/25]> Fold04 #> 5 <split [224/25]> Fold05 #> 6 <split [224/25]> Fold06 #> 7 <split [225/24]> Fold07 #> 8 <split [225/24]> Fold08 #> 9 <split [225/24]> Fold09 #> 10 <split [225/24]> Fold10 @j l
B ot tr pp ng Model Fit Using Estimate Performance
Using Bootstrap Iteration 1 16 19 27 19 23 25 23 13 8 29 1 24 25 4 1 21 14 10 25 23 17 13 7 28 22 15 16 16 8 13 18 28 26 30 3 9 2 24 5 11 12 20 6 12 15 27 14 18 23 21 4 4 30 2 22 28 3 2 17 7 4 23 22 14 6 3 28 17 10 11 12 3 6 20 29 5 13 1 26 8 16 19 24 9 15 19 22 18 20 21 20 5 5 30 2 21 22 3 2 19 10 5 21 21 18 6 3 29 20 11 12 16 4 7 24 28 27 8 14 1 26 9 17 23 25 13 Bootstrap Iteration 2 Bootstrap Iteration 3 @j l
B ot tr pp ng set.seed(123) bootstraps(penguins_train, strata = species)
#> # Bootstrap sampling using stratification #> # A tibble: 25 × 2 #> splits id #> <list> <chr> #> 1 <split [249/91]> Bootstrap01 #> 2 <split [249/93]> Bootstrap02 #> 3 <split [249/96]> Bootstrap03 #> 4 <split [249/88]> Bootstrap04 #> 5 <split [249/89]> Bootstrap05 #> 6 <split [249/82]> Bootstrap06 #> 7 <split [249/87]> Bootstrap07 #> 8 <split [249/87]> Bootstrap08 #> 9 <split [249/85]> Bootstrap09 #> 10 <split [249/95]> Bootstrap10 #> # … with 15 more rows @j l
R sa pl ng et od S u t w
t c ea e im la ed al da io s t(s) vfold_cv() loo_cv() mc_cv() bootstraps() validation_split() @j l
W er d es ou m de s ar a
d nd? @j l
@j l
@j l
w rk o s h tp ://w rf lo s.t
dy od ls.o g/ @j l
@j l
W er d es ou m de s ar a
d nd? rf_spec <- rand_forest(mode = "classification") penguin_formula <- species ~ bill_length_mm + bill_depth_mm + sex @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Random Forest Model Specification (classification) #> #> Computational engine: ranger @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) %>% fit(data = penguins_train) #> ══ Workflow [trained] ══════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Ranger result #> #> Call: #> ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, #> verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE) #> #> Type: Probability estimation #> Number of trees: 500 #> Sample size: 249 #> Number of independent variables: 3 #> Mtry: 1 #> Target node size: 10 #> Variable importance mode: none #> Splitrule: gini #> OOB prediction error (Brier s.): 0.05585744 @j l
I a A H
W er d es ou m de s ar a
d nd? penguin_rec <- recipe(species ~ bill_length_mm + bill_depth_mm + sex, data = penguins_train) %>% step_dummy(sex) %>% step_normalize(all_numeric_predictors()) penguin_rec #> Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 3 #> #> Operations: #> #> Dummy variables from sex #> Centering and scaling for all_numeric_predictors() @j l
W er d es ou m de s ar a
d nd? svm_spec <- svm_linear(mode = "classification") workflow(penguin_rec, svm_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Recipe #> Model: svm_linear() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> 2 Recipe Steps #> #> • step_dummy() #> • step_normalize() #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Linear Support Vector Machine Specification (classification) #> #> Computational engine: LiblineaR @j l
W er d es ou m de s ar a
d nd? penguin_fit <- workflow(penguin_rec, svm_spec) %>% fit(data = penguins_train) @j l
G t ou m de o y ur l pt
p @j l
v ti er h tp ://v ti er.r tu io.c
m @j l
@j l
@j l
G t ou m de o y ur ap op
library(vetiver) v <- vetiver_model(penguin_fit, "svm_penguins") v #> #> ── svm_penguins ─ <butchered_workflow> model for deployment #> A LiblineaR classification modeling workflow using 3 features @j l
G t ou m de o y ur ap op
library(plumber) pr() %>% vetiver_api(v) #> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router. #> # Use `pr_run()` on this object to start the API. #> ├──[queryString] #> ├──[body] #> ├──[cookieParser] #> ├──[sharedSecret] #> ├──/logo #> ├──/ping (GET) #> └──/predict (POST) @j l
G t ou m de o y ur ap op
4 P -b d R C 4 G e D l o o d e t @j l
G t ou m de o y ur ap op
# Generated by the vetiver package; edit with care FROM rocker/r-ver:4.2.0 ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest RUN apt-get update -qq && apt-get install -y --no-install-recommends \ libcurl4-openssl-dev \ libicu-dev \ libsodium-dev \ libssl-dev \ make COPY vetiver_renv.lock renv.lock RUN Rscript -e "install.packages('renv')" RUN Rscript -e "renv::restore()" COPY plumber.R /opt/ml/plumber.R EXPOSE 8000 ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8000)"] @j l
M re o ea n! @j l
T an y u! h ://y .c /j l /
h ://j l .c / h ://t e .o / h ://t .o / P a M U h