Data Manipulation with dplyr (First Steps)

62321e5935c9c0731462b8178a7423f8?s=47 OmaymaS
November 08, 2018

Data Manipulation with dplyr (First Steps)

A workshop for beginners on the #tidyverse, focusing on data manipulation using #dplyr along with hands-on exercises.

Delivered at DataFest Tbilisi 2018.

62321e5935c9c0731462b8178a7423f8?s=128

OmaymaS

November 08, 2018
Tweet

Transcript

  1. INTRO TO THE TIDYVERSE DATA MANIPULATION USING OMAYMA SAID OmaymaS

  2. The Tidyverse Source: https://imgur.com/a/l7fNwP1

  3. The Tidyverse Source: https://imgur.com/a/l7fNwP1

  4. id minion leader type age missions_ internal missions_ external 101

    yellow 5 60 2 102 yellow 6 55 10 108 purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4 > minions dataframe/tbl
  5. id minion leader type age missions_ internal missions_ external 101

    yellow 5 60 2 102 yellow 6 55 10 108 purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4 VARIABLES OBSERVATIONS
  6. kevin <-

  7. kevin <- kevin_new <- rotate(kevin, direction = “clockwise”, angle =

    90) object function arguments
  8. Kevin_new <- rotate(kevin, direction = “clockwise”, angle = 90) object

    function arguments What is the value of Kevin_new ? kevin <-
  9. Kevin_new kevin <- Kevin_new <- rotate(kevin, direction = “clockwise”, angle

    = 90) object function arguments
  10. A grammar of data manipulation

  11. id minion leader type age missions_ internal missions_ external 101

    yellow 5 60 2 102 yellow 6 55 10 108 purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4 > minions
  12. select() Return a subset of columns

  13. select(minions, id, age) dataframe Columns to select

  14. id minion leader type age missions_ internal missions_ external 101

    yellow 5 60 2 102 yellow 6 55 10 108 purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4 id age 101 5 102 6 108 10 120 16 100 3 select(minions, id, age) New dataframe/tbl
  15. select(minions, -missions_external) dataframe Column to exclude

  16. id minion leader type age missions_ internal 101 yellow 5

    60 102 yellow 6 55 108 purple 10 48 120 purple 16 49 100 yellow 3 54 select(minions, -missions_external)
  17. select(minions, id:leader) dataframe Range of columns to select

  18. id minion leader 101 102 108 120 100 select(minions, id:leader)

  19. filter() Return a subset of rows

  20. filter(minions, type == “yellow”) dataframe Condition

  21. id minion leader type age missions_ internal missions_e xternal 101

    yellow 5 60 2 102 yellow 6 55 10 100 yellow 3 54 4 filter(minions, type == “yellow”)
  22. > < >= <= != == equal greater than less

    than greater than or equal less than or equal not equal MORE CONDITIONS & | AND OR COMBINE WITH ,
  23. filter(minions, type == “yellow” , age > 3) dataframe Multiple

    Condition
  24. id minion leader type age missions_ internal missions_e xternal 101

    yellow 5 60 2 102 yellow 6 55 10 filter(minions, type == “yellow” , age > 3)
  25. mutate() add/modify columns

  26. mutate(minions, missions = missions_internal+misssions_external) dataframe expression New column name

  27. id minion leader type age missions_ internal missions_ external missions

    101 yellow 5 60 2 62 102 yellow 6 55 10 65 108 purple 10 48 3 51 120 purple 16 49 1 50 100 yellow 3 54 4 58 mutate(minions, missions = missions_internal+misssions_external)
  28. summarize() Calculate aggregate measures for groups

  29. summarize(minions, age_median = median(age)) expression New column name dataframe

  30. summarize(minions, age_median = median(age)) age_median 6 id minion leader type

    age missions_ internal missions_ external 101 yellow 5 60 2 102 yellow 6 55 10 108 purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4
  31. summarize(minions, age_median = median(age), missions_internal_all = sum(missions_internal), missions_external_all = sum(missions_external))

    Multiple expressions
  32. group_by() Group by one or more variables

  33. minions %>% group_by(leader) %>% summarize(missions_internal_all = sum(missions_internal), missions_external_all = sum(missions_external))

    New column name Expression dataframe group
  34. minions %>% group_by(leader) %>% summarize(missions_internal_all = sum(missions_internal), missions_external_all = sum(missions_external))

    leader missions_internal_all missions_external_all 169 16 97 4
  35. arrange() Reorder rows based on variables

  36. arrange(minions, missions_internal) dataframe Column name

  37. id minion leader type age missions_ internal missions_ external 108

    purple 10 48 3 120 purple 16 49 1 100 yellow 3 54 4 102 yellow 6 55 10 101 yellow 5 60 2 arrange(minions, missions_internal) DEFAULT Ascending
  38. id minion leader type age missions_ internal missions_ external 101

    yellow 5 60 2 102 yellow 6 55 10 100 yellow 3 54 4 120 purple 16 49 1 108 purple 10 48 3 arrange(minions, desc(missions_internal))
  39. %>% The Pipe

  40. <- %>% rotate(“clockwise”, 90) object function <- rotate( , “clockwise”,

    90) arguments object function arguments pipe =
  41. <- scale( , 0.25) 1 Successive commands

  42. <- scale( , 0.25) 1 2 <- rotate( , “clockwise”,

    90) Successive commands
  43. <- scale( , 0.25) <- rotate( , “clockwise”, 90) <-

    clone( , 1) 1 2 3 Successive commands
  44. <- scale( , 0.25) 1 2 <- rotate( , “clockwise”,

    90) <- clone( , 1) 3 Successive commands
  45. <- clone(rotate(scale( , 0.25), “clockwise”, 90),1) One-line commands

  46. k %>% scale(0.25) %>% rotate("clockwise", 90) %>% clone(1) <- Piped

    commands
  47. MISSION ACCOMPLISHED