dplyr + basic benchmark - LA R meetup - Nov 2014

Ce8e94cc306ba164175f693fb01aa8b0?s=47 szilard
November 17, 2014
35

dplyr + basic benchmark - LA R meetup - Nov 2014

Ce8e94cc306ba164175f693fb01aa8b0?s=128

szilard

November 17, 2014
Tweet

Transcript

  1. dplyr + basic benchmark Szilárd Pafka, PhD Chief Scientist, Epoch

    LA R Meetup Nov 2014
  2. dplyr • awesome API for most common data munging •

    fast • data: data.frame, data.table, databases
  3. My main workflow for EDA For last ~10 yrs: •

    Do some munging in database (SQL) • Get results in R (actually call SQL from R) • Data munging, datavis, modeling (e.g. ML) in R (knitr) Now experimenting: • Get all data you need in R in RAM (~100 GB) • Data munging with dplyr, rest the same [RAM cheap, large tables from db now fit in RAM, R/dplyr makes less copies (less RAM), dplyr/data.table fast]
  4. https://github.com/szilard/benchm-dplyr-dt

  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None