Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tokyo.R#93 Data processing

Tokyo.R#93 Data processing

第93回Tokyo.Rでトークした際の資料です。

kilometer

July 03, 2021
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. #93
    @kilometer00
    2021.07.03
    BeginneR Session
    -- Data processing --

    View Slide

  2. Who!?
    Who?

    View Slide

  3. Who!?
    ・ @kilometer
    ・Postdoc Researcher (Ph.D. Eng.)
    ・Neuroscience
    ・Computational Behavior
    ・Functional brain imaging
    ・R: ~ 10 years

    View Slide

  4. 宣伝!!(書籍の翻訳に参加しました。)
    絶賛販売中!

    View Slide

  5. BeginneR Session

    View Slide

  6. BeginneR

    View Slide

  7. Before A'er
    BeginneR Session
    BeginneR BeginneR

    View Slide

  8. BeginneR Advanced Hoxo_m
    If I have seen further it is by standing on the
    shoulders of Giants.
    -- Sir Isaac Newton, 1676

    View Slide

  9. import Tidy Transform
    Visualise
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017

    View Slide

  10. import Tidy Transform
    Visualise
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    preprocessing
    Data processing
    Data science

    View Slide

  11. import Tidy Transform
    Visualise
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    preprocessing
    Data science
    Data
    [email protected] Hypothesis feedback
    Data processing
    Narra/ve of data

    View Slide

  12. import Tidy Transform
    Visualise
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    preprocessing
    Data science
    Data
    [email protected] Hypothesis
    Narra/ve of data
    feedback
    Data processing

    View Slide

  13. import Tidy Transform
    Visualise
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data processing

    View Slide

  14. raed_csv()
    write_csv()
    Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Figures
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Data processing

    View Slide

  15. raed_csv()
    write_csv()
    Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Figures
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    data.frame
    tibble
    Data processing

    View Slide

  16. data.frame

    View Slide

  17. vector
    in Excel

    View Slide

  18. vector
    in R
    in Excel
    pre post > pre
    [1] 1 2 3 4 5
    > post
    [1] 5 10 15 20 25

    View Slide

  19. vector
    vec1 vec2 vec3 > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5
    > vec3
    [1] 1 2 3 4 5

    View Slide

  20. vector
    vec1 vec2 > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5

    View Slide

  21. > ?seq
    vector
    seq{base}
    Sequence Generation
    Description
    Generate regular sequences. seq is a standard
    generic with a default method. …
    Usage
    seq(...)
    ## Default S3 method:
    seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
    length.out = NULL, along.with = NULL, ...)

    View Slide

  22. vector
    vec1 vec2 vec3 > vec1
    [1] 1 2 3 1 2 3
    > vec2
    [1] 1 1 2 2 3 3
    > vec3
    [1] 1 1 2 2 3 3 1 1 2 2 3 3

    View Slide

  23. vector
    vec1 > vec1
    [1] 11 12 13 14 15
    > vec1[1]
    [1] 11
    > vec1[3:5]
    [1] 13 14 15
    > vec1[c(1:2, 5)]
    [1] 11 12 15

    View Slide

  24. list
    list1 > list1
    [[1]]
    [1] 1 2 3 4 5 6
    [[2]]
    [1] 11 12 13 14 15
    [[3]]
    [1] "a" "b" "c"

    View Slide

  25. list
    list1 > list1[[1]]
    [1] 1 2 3 4 5 6
    > list1[[3]][2:3]
    [1] "b" "c"
    > list1[[2]] * 3
    [1] 33 36 39 42 45

    View Slide

  26. named list
    list2 > list2
    $A
    [1] 1 2 3 4 5 6
    $B
    [1] 11 12 13 14 15
    $C
    [1] "a" "b" "c"

    View Slide

  27. > list2$A
    [1] 1 2 3 4 5 6
    > list2$C[2:3]
    [1] "b" "c"
    > list2$B * 3
    [1] 33 36 39 42 45
    named list
    list2

    View Slide

  28. list1 > class(list1)
    [1] "list"
    > names(list1)
    NULL
    list2 > class(list2)
    [1] "list"
    > names(list2)
    [1] "A" "B" "C"
    named list
    list

    View Slide

  29. list3 > class(list3)
    [1] "list"
    > names(list3)
    [1] "A" "B"
    df1 > class(df1)
    [1] "data.frame"
    > names(df1)
    [1] "A" "B"
    named list & data.frame

    View Slide

  30. > str(list3)
    List of 2
    $ A: int [1:3] 1 2 3
    $ B: int [1:3] 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 variables:
    $ A: int 1 2 3
    $ B: int 11 12 13
    list3 df1 named list & data.frame

    View Slide

  31. > list3
    $A
    [1] 1 2 3
    $B
    [1] 11 12 13
    > df1
    A B
    1 1 11
    2 2 12
    3 3 13
    named list & data.frame

    View Slide

  32. > list3
    $A
    [1] 1 2 3
    $B
    [1] 11 12 13
    > df1
    A B
    1 1 11
    2 2 12
    3 3 13
    named list & data.frame
    observa9on
    variable

    View Slide

  33. data.frame v.s. matrix
    A B
    1 1 11
    2 2 12
    3 3 13
    [,1] [,2]
    [1,] 1 11
    [2,] 2 12
    [3,] 3 13
    df1 > str(mat1)
    int [1:3, 1:2] 1 2 3 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 vars.:
    $ A: int 1 2 3
    $ B: int 11 12 13
    mat1

    View Slide

  34. raed_csv()
    write_csv()
    Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Figures
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    data.frame
    tibble
    Data processing

    View Slide

  35. raed_csv()
    write_csv()
    Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Figures
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    data.frame
    ,bble
    Data processing
    Transform
    (verb [email protected])
    {dplyr}

    View Slide

  36. vignette("dplyr")

    View Slide

  37. It (dplyr) provides simple “verbs” to help
    you translate your thoughts into code.
    func?ons that correspond to the most
    common data manipula?on tasks
    Introduc>on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}

    View Slide

  38. dplyrは、あなたの考えをコードに翻訳
    するための【動詞】を提供する。
    データ操作における基本のキを、
    シンプルに実⾏できる関数 (群)
    Introduc>on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}
    ※ かなり意訳

    View Slide

  39. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula9on

    View Slide

  40. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula9on
    0. %>%

    View Slide

  41. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magri8r}
    「dplyr再⼊⾨(基本編)」yutanihilaCon
    h"ps://speakerdeck.com/yutannihila6on/dplyrzai-ru-men-ji-ben-bian

    View Slide





  42. lift
    take
    pour
    put
    Bring milk from the kitchen!

    View Slide


  43. lift
    Bring milk from the kitchen!
    lift(Robot, glass, table) -> Robot'
    take

    take(Robot', fridge, milk) -> Robot''

    View Slide

  44. Bring milk from the kitchen!
    Robot' Robot'' Robot''' result result %
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View Slide

  45. The @dyverse style guides
    h"ps://style.;dyverse.org/syntax.html#object-names
    "There are only two hard things in Computer Science:
    cache invalidation and naming things"

    View Slide

  46. Bring milk from the kitchen!
    Robot' Robot'' Robot''' result result %
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View Slide

  47. Robot' Robot'' Robot''' result result %
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④
    Thinking Reading
    Bring milk from the kitchen!

    View Slide

  48. Programing
    Write
    Run
    Read
    Think
    Write
    Run
    Read
    Think
    Communicate
    Share

    View Slide

  49. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magrittr}
    「dplyr再⼊⾨(基本編)」yutanihilation
    https://speakerdeck.com/yutannihilation/dplyrzai-ru-men-ji-ben-bian

    View Slide

  50. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula9on
    0. %>%

    View Slide

  51. WFSCT {dplyr}
    mutate # カラムの追加
    +
    mutate(dat, C = fun(A, B))

    View Slide

  52. WFSCT {dplyr}
    mutate # カラムの追加
    +
    dat %>% mutate(C = fun(A, B))

    View Slide

  53. WFSCT {dplyr}
    filter # 行の絞り込み
    dat %>% filter(tag %in% c(1, 3, 5))

    View Slide

  54. ブール演算⼦ Boolean Algebra
    A == B A != B
    George Boole
    1815 - 1864
    A | B A & B
    A %in% B
    # equal to # not equal to
    # or # and
    # is A in B?
    wikipedia

    View Slide

  55. "a" != "b"
    # is A in B?
    ブール演算⼦ Boolean Algebra
    [1] TRUE
    1 %in% 10:100
    # is A in B?
    [1] FALSE

    View Slide

  56. George Boole
    1815 - 1864
    A Class-Room Introduc;on to Logic
    h"ps://niyamaklogic.wordpress.com/c
    ategory/laws-of-thoughts/
    Mathema=cian
    Philosopher
    &

    View Slide

  57. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select(tag, B)

    View Slide

  58. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")

    View Slide

  59. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")
    dat %>% select(tag, B)

    View Slide

  60. WFSCT {dplyr}
    # Select help func?ons
    starts_with("s") ends_with("s")
    contains("se") matches("^.e")
    one_of(c(”tag", ”B"))
    everything()
    https://kazutan.github.io/blog/2017/04/dplyr-select-memo/
    「dplyr::selectの活⽤例メモ」kazutan

    View Slide

  61. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula9on
    0. %>%




    View Slide

  62. (SBNNBSPGEBUBNBOJQVMBUJPO
    By constraining your op9ons,
    it helps you think about your data
    manipula9on challenges.
    Introduc>on to dplyr
    hLps://cran.r-project.org/web/packages/dplyr/vigneLes/dplyr.html

    View Slide

  63. 選択肢を制限することで、
    データ解析のステップを
    シンプルに考えられますヨ。
    (めっちゃ意訳)
    Introduc>on to dplyr
    hLps://cran.r-project.org/web/packages/dplyr/vigneLes/dplyr.html
    ※ まさに意訳
    (SBNNBSPGEBUBNBOJQVMBUJPO

    View Slide

  64. より多くの制約を課す事で、
    魂の⾜枷から、より⾃由になる。
    Igor Stravinsky
    И8горь Ф Страви́нский
    The more constraints one imposes,
    the more one frees one's self of the
    chains that shackle the spirit.
    1882 - 1971
    ※ 割と意訳

    View Slide

  65. Enjoy!!

    View Slide