Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data wrangling & manipulation in R - Day 3 slides
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ruan van Mazijk
July 03, 2019
Programming
0
28
Data wrangling & manipulation in R - Day 3 slides
Ruan van Mazijk
July 03, 2019
Tweet
Share
More Decks by Ruan van Mazijk
See All by Ruan van Mazijk
Data wrangling & manipulation in R - Day 2 slides
rvanmazijk
0
57
Data wrangling & manipulation in R - Day 1 slides
rvanmazijk
0
26
Biodiversity, evolution & taxonomy - Teaching Biodiversity Short Course for FET Life Sciences Teachers
rvanmazijk
0
110
An introduction to R Markdown
rvanmazijk
0
130
Does genome size affect plant water-use? - Ecophysiology & phenology in Cape Schoenoid sedges
rvanmazijk
1
42
Environmental turnover predicts plant species richness & turnover - Comparing the Greater Cape Floristic Region & the Southwest Australia Floristic Region
rvanmazijk
0
21
Other Decks in Programming
See All in Programming
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
6.1k
Honoを使ったリモートMCPサーバでAIツールとの連携を加速させる!
tosuri13
1
180
15年続くIoTサービスのSREエンジニアが挑む分散トレーシング導入
melonps
2
230
フロントエンド開発の勘所 -複数事業を経験して見えた判断軸の違い-
heimusu
7
2.8k
AI によるインシデント初動調査の自動化を行う AI インシデントコマンダーを作った話
azukiazusa1
1
750
HTTPプロトコル正しく理解していますか? 〜かわいい猫と共に学ぼう。ฅ^•ω•^ฅ ニャ〜
hekuchan
2
690
今こそ知るべき耐量子計算機暗号(PQC)入門 / PQC: What You Need to Know Now
mackey0225
3
380
組織で育むオブザーバビリティ
ryota_hnk
0
180
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.5k
CSC307 Lecture 04
javiergs
PRO
0
660
CSC307 Lecture 07
javiergs
PRO
1
560
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
141
7.3k
Evolving SEO for Evolving Search Engines
ryanjones
0
130
Test your architecture with Archunit
thirion
1
2.2k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
210
Six Lessons from altMBA
skipperchong
29
4.2k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
120
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
63
Producing Creativity
orderedlist
PRO
348
40k
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Transcript
data_wrangling() && ("manipulation" %in% R) %>% %>% %>% > day[3]
Ruan van Mazijk
tinyurl.com/r-with-ruan Notes & slides will go up here: (But I
encourage you to make your own notes!)
> workshop$outline[1:3] DAY 1 Tidy data principles & tidyr DAY
2 Manipulating data & an intro to dplyr DAY 3 Extending your data with mutate(), summarise() & friends
> workshop$outline[2:3] DAY 2 Manipulating data & an intro to
dplyr DAY 3 Extending your data with mutate(), summarise() & friends
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
data %>%
data %>% gather(key = veg_type, value = fix) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10),
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld"))
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) Summary statistics for each vegetation type?
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) %>% ???() Summary statistics for each vegetation type?
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
dplyr:: # Verbs to extend your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
data %>% mutate(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate(...)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(BMI = height / weight, BMI_std = scale(BMI))
data %>% mutate_all(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_all(.funs, ...) data %>% mutate_all(scale) data %>% mutate_all(list(log,
log1p))
data %>% mutate_if(.predicate, .funs) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_if(.predicate, .funs, ...) data %>% mutate_if(is.numeric, scale) data
%>% mutate_if(is.numeric, list(log, log1p))
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type) %>% summarise(mean_plant_height
= mean(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height),
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE) data %>% group_by(veg_type) %>% summarise_if(is.numeric, list(mean, sd))
> demo()