Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Applied machine learning with tidymodels
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Julia Silge
June 22, 2022
Technology
170
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Applied machine learning with tidymodels
useR! 2022 keynote
Julia Silge
June 22, 2022
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
380
The right tool for the job
juliasilge
0
90
Good practices for applied machine learning
juliasilge
0
250
Maintaining an R Package
juliasilge
0
440
Publishing the Stack Overflow Developer Survey
juliasilge
2
100
Text Mining: Exploratory Data Analysis to Machine Learning
juliasilge
1
260
Text Mining Using Tidy Data Principles
juliasilge
0
190
North American Developer Hiring Landscape
juliasilge
0
90
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
スキルと MCP ツール、責務をどう分けるか? AI が迷わないインターフェース設計の戦略
cdataj
1
870
Oracle Cloud Infrastructure IaaS 新機能アップデート 2026/3 - 2026/5
oracle4engineer
PRO
1
240
「エンジニア進化論」2028年の開発完全自動化、エンジニアはどう進化するか
cyberagentdevelopers
PRO
4
3.2k
AI駆動開発が変える、大規模開発の前提 ーHuman in the Loop から Human on the Loop へ / AIE2026
visional_engineering_and_design
30
23k
AIを「創る」と「使う」の循環 — HRテックが実践するリアルなAI組織実装
taketo957
0
1.9k
AGENTS.mdとSkillsで始めるAIエージェント活用
sonoda_mj
2
170
失敗を経て、Harness Engineering で 大切にしたいことを考える / Learning from Failure: What Matters in Harness Engineering
bitkey
PRO
1
230
製造業のクラウド活用最適解〜AI,DXを加速するデータ基盤の作り方〜
hamadakoji
0
430
手塩にかけりゃいいってもんじゃない
ming_ayami
0
160
作って終わりにしない タイミーのセマンティックレイヤー育成の現在地
chanyou0311
3
2k
Dario Amodi『Policy on the AI Exponential』を理解する
nagatsu
0
210
SIer20年! 培ったスキルがスタートアップで輝く時
shucho0103
0
810
Featured
See All Featured
Designing Powerful Visuals for Engaging Learning
tmiket
1
410
The browser strikes back
jonoalderson
0
1.2k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
280
Amusing Abliteration
ianozsvald
1
200
Designing for humans not robots
tammielis
254
26k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Color Theory Basics | Prateek | Gurzu
gurzu
0
360
Paper Plane (Part 1)
katiecoart
PRO
0
8.8k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.6k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
Why Our Code Smells
bkeepers
PRO
340
58k
Transcript
A pl ed ac in L ar in w th
t dy od ls J li S lg @j l
H ll @j l
h ://x .c /1 /
I a c : h ://v .c /b /m _l
g/
I a c : h ://v .c /b /m _l
g/
W at's he ar es p rt bo t ac
in l ar in i p ac ic ? @j l
@j l
library(tidymodels) #> ── Attaching packages ────────────────────────────────────────────── tidymodels 0.2.0 ── #>
✔ broom 0.8.0 ✔ rsample 0.1.1 #> ✔ dials 1.0.0 ✔ tibble 3.1.7 #> ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 #> ✔ infer 1.0.2 ✔ tune 0.2.0 #> ✔ modeldata 0.1.1 ✔ workflows 0.2.6 #> ✔ parsnip 1.0.0 ✔ workflowsets 0.2.1 #> ✔ purrr 0.3.4 ✔ yardstick 1.0.0 #> ✔ recipes 0.2.0 #> ── Conflicts ───────────────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> • Dig deeper into tidy modeling with R at https://www.tmwr.org @j l
None
t wr.o g
T re t pi s or od y 4 S
u t b 4 W u m s n e 4 G u m o u l @j l
S en in y ur at b dg t @j
l
r am le h tp ://r am le.t dy od
ls.o g @j l
D ta pl tt ng @j l
initial_split() S t r y t a t n t
g s penguins_split <- initial_split(penguins, prop = 0.75) penguins_split #> <Training/Testing/Total> #> <249/84/333> @j l
training() a d testing() C t g n t t
o rsplit penguins_train <- training(penguins_split) penguins_train #> # A tibble: 249 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Chinst… Dream 47.6 18.3 195 3850 fema… #> 2 Adelie Torge… 35.7 17 189 3350 fema… #> 3 Gentoo Biscoe 45.5 15 220 5000 male #> 4 Gentoo Biscoe 48.7 15.7 208 5350 male #> 5 Gentoo Biscoe 46.5 13.5 210 4550 fema… #> # … with 244 more rows, and 1 more variable: year <int> @j l
training() a d testing() C t g n t t
o rsplit penguins_test <- testing(penguins_split) penguins_test #> # A tibble: 84 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Adelie Torge… 40.3 18 195 3250 fema… #> 2 Adelie Torge… 36.7 19.3 193 3450 fema… #> 3 Adelie Torge… 36.6 17.8 185 3700 fema… #> 4 Adelie Torge… 34.4 18.4 184 3325 fema… #> 5 Adelie Torge… 46 21.5 194 4200 male #> # … with 79 more rows, and 1 more variable: year <int> @j l
T e es in d ta s re io s
! @j l
H w an e se he ra ni g et
o c mp re, e al at , a d un m de s? @j l
@j l
C os -v li at on 14 18 28 17
21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 14 18 28 17 21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 @j l
C os -v li at on Model Fit Using Estimate
Performance Using Fold 1 Iteration Fold 2 Iteration Fold 3 Iteration 14 29 17 20 21 8 24 28 3 1 13 26 16 9 5 30 19 15 6 12 27 22 23 25 2 18 7 4 11 10 11 28 18 22 23 7 25 27 4 2 10 26 16 8 5 29 20 13 6 9 30 19 21 24 1 17 12 3 15 14 14 27 18 21 22 7 23 25 2 1 12 24 17 10 3 30 19 15 4 11 29 20 26 28 5 16 8 6 13 9 @j l
C os -v li at on set.seed(123) vfold_cv(penguins_train, strata =
species) #> # 10-fold cross-validation using stratification #> # A tibble: 10 × 2 #> splits id #> <list> <chr> #> 1 <split [223/26]> Fold01 #> 2 <split [223/26]> Fold02 #> 3 <split [223/26]> Fold03 #> 4 <split [224/25]> Fold04 #> 5 <split [224/25]> Fold05 #> 6 <split [224/25]> Fold06 #> 7 <split [225/24]> Fold07 #> 8 <split [225/24]> Fold08 #> 9 <split [225/24]> Fold09 #> 10 <split [225/24]> Fold10 @j l
B ot tr pp ng Model Fit Using Estimate Performance
Using Bootstrap Iteration 1 16 19 27 19 23 25 23 13 8 29 1 24 25 4 1 21 14 10 25 23 17 13 7 28 22 15 16 16 8 13 18 28 26 30 3 9 2 24 5 11 12 20 6 12 15 27 14 18 23 21 4 4 30 2 22 28 3 2 17 7 4 23 22 14 6 3 28 17 10 11 12 3 6 20 29 5 13 1 26 8 16 19 24 9 15 19 22 18 20 21 20 5 5 30 2 21 22 3 2 19 10 5 21 21 18 6 3 29 20 11 12 16 4 7 24 28 27 8 14 1 26 9 17 23 25 13 Bootstrap Iteration 2 Bootstrap Iteration 3 @j l
B ot tr pp ng set.seed(123) bootstraps(penguins_train, strata = species)
#> # Bootstrap sampling using stratification #> # A tibble: 25 × 2 #> splits id #> <list> <chr> #> 1 <split [249/91]> Bootstrap01 #> 2 <split [249/93]> Bootstrap02 #> 3 <split [249/96]> Bootstrap03 #> 4 <split [249/88]> Bootstrap04 #> 5 <split [249/89]> Bootstrap05 #> 6 <split [249/82]> Bootstrap06 #> 7 <split [249/87]> Bootstrap07 #> 8 <split [249/87]> Bootstrap08 #> 9 <split [249/85]> Bootstrap09 #> 10 <split [249/95]> Bootstrap10 #> # … with 15 more rows @j l
R sa pl ng et od S u t w
t c ea e im la ed al da io s t(s) vfold_cv() loo_cv() mc_cv() bootstraps() validation_split() @j l
W er d es ou m de s ar a
d nd? @j l
@j l
@j l
w rk o s h tp ://w rf lo s.t
dy od ls.o g/ @j l
@j l
W er d es ou m de s ar a
d nd? rf_spec <- rand_forest(mode = "classification") penguin_formula <- species ~ bill_length_mm + bill_depth_mm + sex @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Random Forest Model Specification (classification) #> #> Computational engine: ranger @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) %>% fit(data = penguins_train) #> ══ Workflow [trained] ══════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Ranger result #> #> Call: #> ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, #> verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE) #> #> Type: Probability estimation #> Number of trees: 500 #> Sample size: 249 #> Number of independent variables: 3 #> Mtry: 1 #> Target node size: 10 #> Variable importance mode: none #> Splitrule: gini #> OOB prediction error (Brier s.): 0.05585744 @j l
I a A H
W er d es ou m de s ar a
d nd? penguin_rec <- recipe(species ~ bill_length_mm + bill_depth_mm + sex, data = penguins_train) %>% step_dummy(sex) %>% step_normalize(all_numeric_predictors()) penguin_rec #> Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 3 #> #> Operations: #> #> Dummy variables from sex #> Centering and scaling for all_numeric_predictors() @j l
W er d es ou m de s ar a
d nd? svm_spec <- svm_linear(mode = "classification") workflow(penguin_rec, svm_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Recipe #> Model: svm_linear() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> 2 Recipe Steps #> #> • step_dummy() #> • step_normalize() #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Linear Support Vector Machine Specification (classification) #> #> Computational engine: LiblineaR @j l
W er d es ou m de s ar a
d nd? penguin_fit <- workflow(penguin_rec, svm_spec) %>% fit(data = penguins_train) @j l
G t ou m de o y ur l pt
p @j l
v ti er h tp ://v ti er.r tu io.c
m @j l
@j l
@j l
G t ou m de o y ur ap op
library(vetiver) v <- vetiver_model(penguin_fit, "svm_penguins") v #> #> ── svm_penguins ─ <butchered_workflow> model for deployment #> A LiblineaR classification modeling workflow using 3 features @j l
G t ou m de o y ur ap op
library(plumber) pr() %>% vetiver_api(v) #> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router. #> # Use `pr_run()` on this object to start the API. #> ├──[queryString] #> ├──[body] #> ├──[cookieParser] #> ├──[sharedSecret] #> ├──/logo #> ├──/ping (GET) #> └──/predict (POST) @j l
G t ou m de o y ur ap op
4 P -b d R C 4 G e D l o o d e t @j l
G t ou m de o y ur ap op
# Generated by the vetiver package; edit with care FROM rocker/r-ver:4.2.0 ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest RUN apt-get update -qq && apt-get install -y --no-install-recommends \ libcurl4-openssl-dev \ libicu-dev \ libsodium-dev \ libssl-dev \ make COPY vetiver_renv.lock renv.lock RUN Rscript -e "install.packages('renv')" RUN Rscript -e "renv::restore()" COPY plumber.R /opt/ml/plumber.R EXPOSE 8000 ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8000)"] @j l
M re o ea n! @j l
T an y u! h ://y .c /j l /
h ://j l .c / h ://t e .o / h ://t .o / P a M U h