Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data wrangling & manipulation in R - Day 3 slides
Search
Ruan van Mazijk
July 03, 2019
Programming
0
27
Data wrangling & manipulation in R - Day 3 slides
Ruan van Mazijk
July 03, 2019
Tweet
Share
More Decks by Ruan van Mazijk
See All by Ruan van Mazijk
Data wrangling & manipulation in R - Day 2 slides
rvanmazijk
0
50
Data wrangling & manipulation in R - Day 1 slides
rvanmazijk
0
23
Biodiversity, evolution & taxonomy - Teaching Biodiversity Short Course for FET Life Sciences Teachers
rvanmazijk
0
99
An introduction to R Markdown
rvanmazijk
0
120
Does genome size affect plant water-use? - Ecophysiology & phenology in Cape Schoenoid sedges
rvanmazijk
1
39
Environmental turnover predicts plant species richness & turnover - Comparing the Greater Cape Floristic Region & the Southwest Australia Floristic Region
rvanmazijk
0
18
Other Decks in Programming
See All in Programming
なんとなくわかった気になるブロックテーマ入門/contents.nagoya 2025 6.28
chiilog
1
270
データの民主化を支える、透明性のあるデータ利活用への挑戦 2025-06-25 Database Engineering Meetup#7
y_ken
0
340
A full stack side project webapp all in Kotlin (KotlinConf 2025)
dankim
0
100
Modern Angular with Signals and Signal Store:New Rules for Your Architecture @enterJS Advanced Angular Day 2025
manfredsteyer
PRO
0
200
なぜ適用するか、移行して理解するClean Architecture 〜構造を超えて設計を継承する〜 / Why Apply, Migrate and Understand Clean Architecture - Inherit Design Beyond Structure
seike460
PRO
3
750
ニーリーにおけるプロダクトエンジニア
nealle
0
780
今ならAmazon ECSのサービス間通信をどう選ぶか / Selection of ECS Interservice Communication 2025
tkikuc
21
3.9k
20250704_教育事業におけるアジャイルなデータ基盤構築
hanon52_
5
680
来たるべき 8.0 に備えて React 19 新機能と React Router 固有機能の取捨選択とすり合わせを考える
oukayuka
2
910
PipeCDのプラグイン化で目指すところ
warashi
1
260
第9回 情シス転職ミートアップ 株式会社IVRy(アイブリー)の紹介
ivry_presentationmaterials
1
270
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
50
33k
Featured
See All Featured
The Language of Interfaces
destraynor
158
25k
Faster Mobile Websites
deanohume
307
31k
How STYLIGHT went responsive
nonsquared
100
5.6k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
Optimizing for Happiness
mojombo
379
70k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
Producing Creativity
orderedlist
PRO
346
40k
Fireside Chat
paigeccino
37
3.5k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
Stop Working from a Prison Cell
hatefulcrawdad
270
21k
Transcript
data_wrangling() && ("manipulation" %in% R) %>% %>% %>% > day[3]
Ruan van Mazijk
tinyurl.com/r-with-ruan Notes & slides will go up here: (But I
encourage you to make your own notes!)
> workshop$outline[1:3] DAY 1 Tidy data principles & tidyr DAY
2 Manipulating data & an intro to dplyr DAY 3 Extending your data with mutate(), summarise() & friends
> workshop$outline[2:3] DAY 2 Manipulating data & an intro to
dplyr DAY 3 Extending your data with mutate(), summarise() & friends
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
data %>%
data %>% gather(key = veg_type, value = fix) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10),
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld"))
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) Summary statistics for each vegetation type?
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) %>% ???() Summary statistics for each vegetation type?
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
dplyr:: # Verbs to extend your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
data %>% mutate(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate(...)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(BMI = height / weight, BMI_std = scale(BMI))
data %>% mutate_all(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_all(.funs, ...) data %>% mutate_all(scale) data %>% mutate_all(list(log,
log1p))
data %>% mutate_if(.predicate, .funs) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_if(.predicate, .funs, ...) data %>% mutate_if(is.numeric, scale) data
%>% mutate_if(is.numeric, list(log, log1p))
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type) %>% summarise(mean_plant_height
= mean(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height),
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE) data %>% group_by(veg_type) %>% summarise_if(is.numeric, list(mean, sd))
> demo()