Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data wrangling & manipulation in R - Day 3 slides
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Ruan van Mazijk
July 03, 2019
Programming
29
0
Share
Data wrangling & manipulation in R - Day 3 slides
Ruan van Mazijk
July 03, 2019
More Decks by Ruan van Mazijk
See All by Ruan van Mazijk
Data wrangling & manipulation in R - Day 2 slides
rvanmazijk
0
62
Data wrangling & manipulation in R - Day 1 slides
rvanmazijk
0
28
Biodiversity, evolution & taxonomy - Teaching Biodiversity Short Course for FET Life Sciences Teachers
rvanmazijk
0
120
An introduction to R Markdown
rvanmazijk
0
130
Does genome size affect plant water-use? - Ecophysiology & phenology in Cape Schoenoid sedges
rvanmazijk
1
43
Environmental turnover predicts plant species richness & turnover - Comparing the Greater Cape Floristic Region & the Southwest Australia Floristic Region
rvanmazijk
0
24
Other Decks in Programming
See All in Programming
How We Benchmarked Quarkus: Patterns and anti-patterns
hollycummins
1
150
実用!Hono RPC2026
yodaka
2
240
PicoRuby for IoT: Connecting to the Cloud with MQTT
yuuu
2
620
SREに優しいTerraform構成 modulesとstateの組み方
hiyanger
2
140
検索設計から 推論設計への重心移動と Recall-First Retrieval
po3rin
2
640
Programming with a DJ Controller — not vibe coding
m_seki
3
130
Server-Side Kotlin LT大会 vol.18 [Kotlin-lspの最新情報と Neovimのlsp設定例]
yasunori0418
1
160
Claude Codeをカスタムして自分だけのClaude Codeを作ろう
terisuke
0
140
The Less-Told Story of Socket Timeouts
coe401_
3
460
Angular Signal Forms
debug_mode
0
110
GitHubCopilotCLIをはじめよう.pdf
htkym
0
200
Oxlintとeslint-plugin-react-hooks 明日から始められそう?
t6adev
0
270
Featured
See All Featured
Docker and Python
trallard
47
3.8k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
350
My Coaching Mixtape
mlcsv
0
100
Product Roadmaps are Hard
iamctodd
PRO
55
12k
YesSQL, Process and Tooling at Scale
rocio
174
15k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
180
Designing for humans not robots
tammielis
254
26k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
770
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
170
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.9k
Transcript
data_wrangling() && ("manipulation" %in% R) %>% %>% %>% > day[3]
Ruan van Mazijk
tinyurl.com/r-with-ruan Notes & slides will go up here: (But I
encourage you to make your own notes!)
> workshop$outline[1:3] DAY 1 Tidy data principles & tidyr DAY
2 Manipulating data & an intro to dplyr DAY 3 Extending your data with mutate(), summarise() & friends
> workshop$outline[2:3] DAY 2 Manipulating data & an intro to
dplyr DAY 3 Extending your data with mutate(), summarise() & friends
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
data %>%
data %>% gather(key = veg_type, value = fix) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>%
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10),
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld"))
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) Summary statistics for each vegetation type?
data %>% gather(key = veg_type, value = fix) %>% separate(fix,
into = c("lon", "lat")) %>% select(veg_type, lon, lat, soil, plant_height) %>% filter(plant_height %>% between(0.5, 10), veg_type %in% c("fynbos", "strandveld", "renosterveld")) %>% ???() Summary statistics for each vegetation type?
dplyr:: # Verbs to manipulate your data select() # operates
on columns filter() # operates on rows
dplyr:: # Verbs to extend your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
data %>% mutate(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate(...)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(...) data %>% mutate(BMI = height / weight)
data %>% mutate(BMI = height / weight, BMI_std = scale(BMI))
data %>% mutate_all(...) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_all(.funs, ...) data %>% mutate_all(scale) data %>% mutate_all(list(log,
log1p))
data %>% mutate_if(.predicate, .funs) CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/
data %>% mutate_if(.predicate, .funs, ...) data %>% mutate_if(is.numeric, scale) data
%>% mutate_if(is.numeric, list(log, log1p))
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
dplyr:: # Verbs to extent your data mutate() # operates
on columns group_by() # operates on rows summarise() # rows & columns
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type)
CC BY SA RStudio https://www.rstudio.com/resources/cheatsheets/ data %>% group_by(veg_type) %>% summarise(mean_plant_height
= mean(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height),
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE)
data %>% group_by(veg_type) %>% summarise(mean_plant_height = mean(plant_height), st_plant_height = sd(plant_height))
data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean) data %>% group_by(veg_type) %>% summarise_if(is.numeric, mean, na.rm = TRUE) data %>% group_by(veg_type) %>% summarise_if(is.numeric, list(mean, sd))
> demo()