Slide 1

Slide 1 text

dplyr + basic benchmark Szilárd Pafka, PhD Chief Scientist, Epoch LA R Meetup Nov 2014

Slide 2

Slide 2 text

dplyr ● awesome API for most common data munging ● fast ● data: data.frame, data.table, databases

Slide 3

Slide 3 text

My main workflow for EDA For last ~10 yrs: ● Do some munging in database (SQL) ● Get results in R (actually call SQL from R) ● Data munging, datavis, modeling (e.g. ML) in R (knitr) Now experimenting: ● Get all data you need in R in RAM (~100 GB) ● Data munging with dplyr, rest the same [RAM cheap, large tables from db now fit in RAM, R/dplyr makes less copies (less RAM), dplyr/data.table fast]

Slide 4

Slide 4 text

https://github.com/szilard/benchm-dplyr-dt

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content