Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data wrangling & manipulation in R - Day 3 slides
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Ruan van Mazijk
July 03, 2019
Programming
0
29
Data wrangling & manipulation in R - Day 3 slides
Ruan van Mazijk
July 03, 2019
Tweet
Share
More Decks by Ruan van Mazijk
See All by Ruan van Mazijk
Data wrangling & manipulation in R - Day 2 slides
rvanmazijk
0
59
Data wrangling & manipulation in R - Day 1 slides
rvanmazijk
0
28
Biodiversity, evolution & taxonomy - Teaching Biodiversity Short Course for FET Life Sciences Teachers
rvanmazijk
0
120
An introduction to R Markdown
rvanmazijk
0
130
Does genome size affect plant water-use? - Ecophysiology & phenology in Cape Schoenoid sedges
rvanmazijk
1
43
Environmental turnover predicts plant species richness & turnover - Comparing the Greater Cape Floristic Region & the Southwest Australia Floristic Region
rvanmazijk
0
22
Other Decks in Programming
See All in Programming
脱 雰囲気実装!AgentCoreを良い感じにWEBアプリケーションに組み込むために
takuyay0ne
3
330
GC言語のWasm化とComponent Modelサポートの実践と課題 - Scalaの場合
tanishiking
0
120
AI時代の脳疲弊と向き合う ~言語学としてのPHP~
sakuraikotone
1
480
仕様漏れ実装漏れをなくすトレーサビリティAI基盤のご紹介
orgachem
PRO
6
2.3k
SourceGeneratorのマーカー属性問題について
htkym
0
200
「やめとこ」がなくなった — 1月にZennを始めて22本書いた AI共創開発のリアル
atani14
0
400
Ruby and LLM Ecosystem 2nd
koic
1
1k
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
590
Claude Code Skill入門
mayahoney
0
400
GoのDB アクセスにおける 「型安全」と「柔軟性」の両立 - Bob という選択肢
tak848
0
230
Codex の「自走力」を高める
yorifuji
0
1.2k
Kubernetesでセルフホストが簡単なNewSQLを求めて / Seeking a NewSQL Database That's Simple to Self-Host on Kubernetes
nnaka2992
0
160
Featured
See All Featured
Believing is Seeing
oripsolob
1
86
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.7k
Mobile First: as difficult as doing things right
swwweet
225
10k
Become a Pro
speakerdeck
PRO
31
5.9k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
74
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
410
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
300
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.7k
Mind Mapping
helmedeiros
PRO
1
130
Chasing Engaging Ingredients in Design
codingconduct
0
140
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Transcript
data_wrangling() && ("manipulation" %in% R) %>% %>% %>% > day[3]
Ruan van Mazijk
tinyurl.com/r-with-ruan Notes & slides will go up here: (But I
encourage you to make your own notes!)
> workshop$outline[1:3] DAY 1 Tidy data principles & tidyr DAY
2 Manipulating data & an intro to dplyr DAY 3 Extending your data with mutate(), summarise() & friends
> workshop$outline[2:3] DAY 2 Manipulating data & an intro to
dplyr DAY 3 Extending your data with mutate(), summarise() & friends
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
data %>%
data %>% gather(key = veg_type, value = fix) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10),
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld"))
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) Summary statistics for each vegetation type?
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) %>% ???() Summary statistics for each vegetation type?
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
dplyr:: # Verbs to extend your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
data %>% mutate(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate(...)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(BMI = height / weight, BMI_std = scale(BMI))
data %>% mutate_all(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_all(.funs, ...) data %>% mutate_all(scale) data %>% mutate_all(list(log,
log1p))
data %>% mutate_if(.predicate, .funs) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_if(.predicate, .funs, ...) data %>% mutate_if(is.numeric, scale) data
%>% mutate_if(is.numeric, list(log, log1p))
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type) %>% summarise(mean_plant_height
= mean(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height),
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE) data %>% group_by(veg_type) %>% summarise_if(is.numeric, list(mean, sd))
> demo()