Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Szilard Pafka - dplyr + basic benchmark - LA R ...

Data Science LA
November 07, 2014
350

Szilard Pafka - dplyr + basic benchmark - LA R meetup - Nov 2014

Data Science LA

November 07, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. dplyr • awesome API for most common data munging •

    fast • data: data.frame, data.table, databases
  2. My main workflow for EDA For last ~10 yrs: •

    Do some munging in database (SQL) • Get results in R (actually call SQL from R) • Data munging, datavis, modeling (e.g. ML) in R (knitr) Now experimenting: • Get all data you need in R in RAM (~100 GB) • Data munging with dplyr, rest the same [RAM cheap, large tables from db now fit in RAM, R/dplyr makes less copies (less RAM), dplyr/data.table fast]