Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data wrangling & manipulation in R - Day 3 slides
Search
Ruan van Mazijk
July 03, 2019
Programming
0
28
Data wrangling & manipulation in R - Day 3 slides
Ruan van Mazijk
July 03, 2019
Tweet
Share
More Decks by Ruan van Mazijk
See All by Ruan van Mazijk
Data wrangling & manipulation in R - Day 2 slides
rvanmazijk
0
56
Data wrangling & manipulation in R - Day 1 slides
rvanmazijk
0
24
Biodiversity, evolution & taxonomy - Teaching Biodiversity Short Course for FET Life Sciences Teachers
rvanmazijk
0
110
An introduction to R Markdown
rvanmazijk
0
120
Does genome size affect plant water-use? - Ecophysiology & phenology in Cape Schoenoid sedges
rvanmazijk
1
41
Environmental turnover predicts plant species richness & turnover - Comparing the Greater Cape Floristic Region & the Southwest Australia Floristic Region
rvanmazijk
0
21
Other Decks in Programming
See All in Programming
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
700
30分でDoctrineの仕組みと使い方を完全にマスターする / phpconkagawa 2025 Doctrine
ttskch
3
830
React Native New Architecture 移行実践報告
taminif
1
150
大体よく分かるscala.collection.immutable.HashMap ~ Compressed Hash-Array Mapped Prefix-tree (CHAMP) ~
matsu_chara
1
220
ローターアクトEクラブ アメリカンナイト:川端 柚菜 氏(Japan O.K. ローターアクトEクラブ 会長):2720 Japan O.K. ロータリーEクラブ2025年12月1日卓話
2720japanoke
0
730
Go コードベースの構成と AI コンテキスト定義
andpad
0
120
WebRTC と Rust と8K 60fps
tnoho
2
2k
FluorTracer / RayTracingCamp11
kugimasa
0
230
ゲームの物理 剛体編
fadis
0
330
UIデザインに役立つ 2025年の最新CSS / The Latest CSS for UI Design 2025
clockmaker
18
7.4k
WebRTC、 綺麗に見るか滑らかに見るか
sublimer
1
160
手が足りない!兼業データエンジニアに必要だったアーキテクチャと立ち回り
zinkosuke
0
630
Featured
See All Featured
Embracing the Ebb and Flow
colly
88
4.9k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
A Tale of Four Properties
chriscoyier
162
23k
We Have a Design System, Now What?
morganepeng
54
7.9k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Unsuck your backbone
ammeep
671
58k
Facilitating Awesome Meetings
lara
57
6.7k
Into the Great Unknown - MozCon
thekraken
40
2.2k
GitHub's CSS Performance
jonrohan
1032
470k
Six Lessons from altMBA
skipperchong
29
4.1k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.2k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Transcript
data_wrangling() && ("manipulation" %in% R) %>% %>% %>% > day[3]
Ruan van Mazijk
tinyurl.com/r-with-ruan Notes & slides will go up here: (But I
encourage you to make your own notes!)
> workshop$outline[1:3] DAY 1 Tidy data principles & tidyr DAY
2 Manipulating data & an intro to dplyr DAY 3 Extending your data with mutate(), summarise() & friends
> workshop$outline[2:3] DAY 2 Manipulating data & an intro to
dplyr DAY 3 Extending your data with mutate(), summarise() & friends
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
data %>%
data %>% gather(key = veg_type, value = fix) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10),
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld"))
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) Summary statistics for each vegetation type?
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) %>% ???() Summary statistics for each vegetation type?
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
dplyr:: # Verbs to extend your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
data %>% mutate(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate(...)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(BMI = height / weight, BMI_std = scale(BMI))
data %>% mutate_all(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_all(.funs, ...) data %>% mutate_all(scale) data %>% mutate_all(list(log,
log1p))
data %>% mutate_if(.predicate, .funs) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_if(.predicate, .funs, ...) data %>% mutate_if(is.numeric, scale) data
%>% mutate_if(is.numeric, list(log, log1p))
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type) %>% summarise(mean_plant_height
= mean(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height),
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE) data %>% group_by(veg_type) %>% summarise_if(is.numeric, list(mean, sd))
> demo()