Slide 1

Slide 1 text

A short introduction to dplyr Juan Natera Los Angeles R Meetup 09/04/2014

Slide 2

Slide 2 text

A bit about me • Software Engineer • Interested in R and its use for gaining insights about data • Open Source enthusiast • Baseball fanatic

Slide 3

Slide 3 text

About dplyr • Developed by Hadley Wickham, Chief Scientist @ Rstudio. • Part of a suite of packages meant to facilitate working on the “data pipeline”.

Slide 4

Slide 4 text

Why? • People spend a lot of time getting data ready for analysis • Almost no learning curve (just need to learn 5 verbs) • Improves readability • It's FAST

Slide 5

Slide 5 text

The data pipeline Tidy Transform Model Visualize

Slide 6

Slide 6 text

The 5 verbs • flter: remove rows • select: choose columns • arrange: reorder rows • mutate: change data • summarize: guess...

Slide 7

Slide 7 text

No learning curve, how? • First parameter is always a data.frame • Other parameters describe what you want to do with it. • Always returns a new data.frame

Slide 8

Slide 8 text

It's Fast

Slide 9

Slide 9 text

Let's see some code!

Slide 10

Slide 10 text

A great book I picked up at useR 2014

Slide 11

Slide 11 text

Questions or Comments? [email protected]