Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Juan Natera - Highlights of useR conference - L...

Data Science LA
September 06, 2014
3.1k

Juan Natera - Highlights of useR conference - LA R meetup - Sep 2014

Data Science LA

September 06, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. A bit about me • Software Engineer • Interested in

    R and its use for gaining insights about data • Open Source enthusiast • Baseball fanatic
  2. About dplyr • Developed by Hadley Wickham, Chief Scientist @

    Rstudio. • Part of a suite of packages meant to facilitate working on the “data pipeline”.
  3. Why? • People spend a lot of time getting data

    ready for analysis • Almost no learning curve (just need to learn 5 verbs) • Improves readability • It's FAST
  4. The 5 verbs • flter: remove rows • select: choose

    columns • arrange: reorder rows • mutate: change data • summarize: guess...
  5. No learning curve, how? • First parameter is always a

    data.frame • Other parameters describe what you want to do with it. • Always returns a new data.frame