Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Szilard Pafka - dplyr + basic benchmark - LA R meetup - Nov 2014

E936a58f495e26123f9f537ea31968f7?s=47 Data Science LA
November 07, 2014
250

Szilard Pafka - dplyr + basic benchmark - LA R meetup - Nov 2014

E936a58f495e26123f9f537ea31968f7?s=128

Data Science LA

November 07, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. dplyr + basic benchmark Szilárd Pafka, PhD Chief Scientist, Epoch

    LA R Meetup Nov 2014
  2. dplyr • awesome API for most common data munging •

    fast • data: data.frame, data.table, databases
  3. My main workflow for EDA For last ~10 yrs: •

    Do some munging in database (SQL) • Get results in R (actually call SQL from R) • Data munging, datavis, modeling (e.g. ML) in R (knitr) Now experimenting: • Get all data you need in R in RAM (~100 GB) • Data munging with dplyr, rest the same [RAM cheap, large tables from db now fit in RAM, R/dplyr makes less copies (less RAM), dplyr/data.table fast]
  4. https://github.com/szilard/benchm-dplyr-dt

  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None