Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Juan Natera - Highlights of useR conference - LA R meetup - Sep 2014

E936a58f495e26123f9f537ea31968f7?s=47 Data Science LA
September 06, 2014

Juan Natera - Highlights of useR conference - LA R meetup - Sep 2014


Data Science LA

September 06, 2014

More Decks by Data Science LA


  1. A short introduction to dplyr Juan Natera Los Angeles R

    Meetup 09/04/2014
  2. A bit about me • Software Engineer • Interested in

    R and its use for gaining insights about data • Open Source enthusiast • Baseball fanatic
  3. About dplyr • Developed by Hadley Wickham, Chief Scientist @

    Rstudio. • Part of a suite of packages meant to facilitate working on the “data pipeline”.
  4. Why? • People spend a lot of time getting data

    ready for analysis • Almost no learning curve (just need to learn 5 verbs) • Improves readability • It's FAST
  5. The data pipeline Tidy Transform Model Visualize

  6. The 5 verbs • flter: remove rows • select: choose

    columns • arrange: reorder rows • mutate: change data • summarize: guess...
  7. No learning curve, how? • First parameter is always a

    data.frame • Other parameters describe what you want to do with it. • Always returns a new data.frame
  8. It's Fast

  9. Let's see some code!

  10. A great book I picked up at useR 2014

  11. Questions or Comments? naterajj@gmail.com