Slide 3
Slide 3 text
My main workflow for EDA
For last ~10 yrs:
●
Do some munging in database (SQL)
●
Get results in R (actually call SQL from R)
●
Data munging, datavis, modeling (e.g. ML) in R (knitr)
Now experimenting:
●
Get all data you need in R in RAM (~100 GB)
●
Data munging with dplyr, rest the same
[RAM cheap, large tables from db now fit in RAM, R/dplyr
makes less copies (less RAM), dplyr/data.table fast]