OmaymaS
November 08, 2018
1.9k

# Data Manipulation with dplyr (First Steps)

A workshop for beginners on the #tidyverse, focusing on data manipulation using #dplyr along with hands-on exercises.

Delivered at DataFest Tbilisi 2018.

November 08, 2018

## Transcript

1. INTRO TO THE TIDYVERSE
DATA MANIPULATION USING
OMAYMA SAID
4. id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4
> minions
dataframe/tbl

5. id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4
VARIABLES
OBSERVATIONS

6. kevin <-

7. kevin <-
kevin_new <- rotate(kevin,
direction = “clockwise”,
angle = 90)
object
function arguments

8. Kevin_new <- rotate(kevin,
direction = “clockwise”,
angle = 90)
object
function arguments
What is the value of Kevin_new
?
kevin <-

9. Kevin_new
kevin <-
Kevin_new <- rotate(kevin,
direction = “clockwise”,
angle = 90)
object
function arguments

A grammar of data manipulation

11. id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4
> minions

12. select()
Return a subset of columns

13. select(minions, id, age)
dataframe
Columns
to select

14. id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4
id age
101 5
102 6
108 10
120 16
100 3
select(minions, id, age)
New dataframe/tbl

15. select(minions, -missions_external)
dataframe Column to exclude

16. id minion leader type age missions_
internal
101 yellow 5 60
102 yellow 6 55
108 purple 10 48
120 purple 16 49
100 yellow 3 54
select(minions, -missions_external)

dataframe
Range of
columns to
select

101
102
108
120
100

19. filter()
Return a subset of rows

filter(minions, type == "yellow")
dataframe Condition
dataframe Condition

21. id minion leader type age missions_
internal
missions_e
xternal
101 yellow 5 60 2
102 yellow 6 55 10
100 yellow 3 54 4
filter(minions, type == “yellow”)

22. >
<
>=
<=
!=
== equal
greater than
less than
greater than or equal
less than or equal
not equal
MORE CONDITIONS
&
|
AND
OR
COMBINE WITH
,

23. filter(minions, type == “yellow”
, age > 3)
dataframe Multiple Condition

24. id minion leader type age missions_
internal
missions_e
xternal
101 yellow 5 60 2
102 yellow 6 55 10
filter(minions, type == “yellow”
, age > 3)

mutate()

mutate(minions, missions = missions_internal+misssions_external)
dataframe expression
New column name
dataframe expression
New
column
name

27. id minion leader type age missions_
internal
missions_
external
missions
101 yellow 5 60 2 62
102 yellow 6 55 10 65
108 purple 10 48 3 51
120 purple 16 49 1 50
100 yellow 3 54 4 58
mutate(minions, missions = missions_internal+misssions_external)

28. summarize()
Calculate aggregate measures for groups

29. summarize(minions, age_median = median(age))
expression
New column
name
dataframe

30. summarize(minions, age_median = median(age))
age_median
6
id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4

31. summarize(minions,
age_median = median(age),
missions_internal_all = sum(missions_internal),
missions_external_all = sum(missions_external))
Multiple expressions

32. group_by()
Group by one or more variables

33. minions %>%
summarize(missions_internal_all = sum(missions_internal),
missions_external_all = sum(missions_external))
New column name Expression
dataframe group

34. minions %>%
summarize(missions_internal_all = sum(missions_internal),
missions_external_all = sum(missions_external))
169 16
97 4

35. arrange()
Reorder rows based on variables

arrange(minions, missions_internal)
dataframe Column name
dataframe Column name

37. id minion leader type age missions_
internal
missions_
external
108 purple 10 48 3
120 purple 16 49 1
100 yellow 3 54 4
102 yellow 6 55 10
101 yellow 5 60 2
arrange(minions, missions_internal)
DEFAULT
Ascending

38. id minion leader type age missions_
internal
missions_
external
101 yellow 5 60 2
102 yellow 6 55 10
100 yellow 3 54 4
120 purple 16 49 1
108 purple 10 48 3
arrange(minions, desc(missions_internal))

39. %>%
The Pipe

40. <- %>% rotate(“clockwise”, 90)
object function
<- rotate( , “clockwise”, 90)
arguments
object
function arguments
pipe
=

41. <- scale( , 0.25)
1
Successive commands

42. <- scale( , 0.25)
1
2 <- rotate( , “clockwise”, 90)
Successive commands

43. <- scale( , 0.25)
<- rotate( , “clockwise”, 90)
<- clone( , 1)
1
2
3
Successive commands

44. <- scale( , 0.25)
1
2 <- rotate( , “clockwise”, 90)
<- clone( , 1)
3
Successive commands

45. <- clone(rotate(scale( , 0.25), “clockwise”, 90),1)
One-line commands

46. k %>%
scale(0.25) %>%
rotate("clockwise", 90) %>%
clone(1)
<-
Piped commands

MISSION ACCOMPLISHED