$30 off During Our Annual Pro Sale. View Details »

Juan Natera - Highlights of useR conference - LA R meetup - Sep 2014

Data Science LA
September 06, 2014
3k

Juan Natera - Highlights of useR conference - LA R meetup - Sep 2014

Data Science LA

September 06, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. A short introduction
    to dplyr
    Juan Natera
    Los Angeles R Meetup
    09/04/2014

    View Slide

  2. A bit about me
    • Software Engineer
    • Interested in R and its use for gaining
    insights about data
    • Open Source enthusiast
    • Baseball fanatic

    View Slide

  3. About dplyr
    • Developed by Hadley Wickham, Chief
    Scientist @ Rstudio.
    • Part of a suite of packages meant to
    facilitate working on the “data pipeline”.

    View Slide

  4. Why?
    • People spend a lot of time getting data
    ready for analysis
    • Almost no learning curve (just need to
    learn 5 verbs)
    • Improves readability
    • It's FAST

    View Slide

  5. The data pipeline
    Tidy Transform
    Model
    Visualize

    View Slide

  6. The 5 verbs
    • flter: remove rows
    • select: choose columns
    • arrange: reorder rows
    • mutate: change data
    • summarize: guess...

    View Slide

  7. No learning curve, how?
    • First parameter is always a data.frame
    • Other parameters describe what you want
    to do with it.
    • Always returns a new data.frame

    View Slide

  8. It's Fast

    View Slide

  9. Let's see some code!

    View Slide

  10. A great book I picked up at
    useR 2014

    View Slide

  11. Questions or Comments?
    [email protected]

    View Slide