Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TokyoR#104_DataProcessing

 TokyoR#104_DataProcessing

第104回Tokyo.Rでしゃべった際の資料です。

kilometer

March 04, 2023
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. #104
    @kilometer00
    2023.03.04
    BeginneR Session
    Data processing &
    visualization

    View full-size slide

  2. Who!?
    Who?

    View full-size slide

  3. Who!?
    名前: 三村 @kilometer
    職業: ポスドク (こうがくはくし)
    専⾨: ⾏動神経科学(霊⻑類)
    脳イメージング
    医療システム⼯学
    R歴: ~ 10年ぐらい
    流⾏: 椅⼦を新調

    View full-size slide

  4. 宣伝!!(書籍の翻訳に参加しました。)
    絶賛販売中!

    View full-size slide

  5. BeginneR Session

    View full-size slide

  6. Before A'er
    BeginneR Session
    BeginneR BeginneR

    View full-size slide

  7. BeginneR Advanced Hoxo_m
    If I have seen further it is by standing on the
    shoulders of Giants.
    -- Sir Isaac Newton, 1676

    View full-size slide

  8. #104
    @kilometer00
    2023.03.03
    BeginneR Session
    Data processing &
    visualization

    View full-size slide

  9. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View full-size slide

  10. σʔλ
    情報のうち意思伝達・解釈・処理に
    適した再利⽤可能なもの
    国際電気標準会議(Interna@onal Electrotechnical Commission, IEC)による定義

    View full-size slide

  11. σʔλ
    情報のうち意思伝達・解釈・処理に
    適した再利⽤可能なもの
    ৘ใ 実存を符号化した表象

    View full-size slide

  12. σʔλ
    ৘ใͷ͏ͪҙࢥ఻ୡɾղऍɾॲཧʹ
    దͨ͠࠶ར༻Մೳͳ΋ͷ
    ৘ใ ࣮ଘΛූ߸Խͨ͠ද৅
    ࣮ଘ
    ؍࡯ͷ༗ແʹΑΒͣଘࡏ͍ͯ͠Δ
    ΋ͷͦͷ΋ͷ
    ࣸ૾ʢූ߸Խʣ

    View full-size slide

  13. 集合! 集合"
    要素# 要素$
    写像 %: ! → "もしくは%: # ⟼ $
    (始集合・定義域) (終集合・終域)
    【写像】
    ある集合の要素を他の集合のただ1つの要素に
    対応づける規則

    View full-size slide

  14. 地図空間
    ⽣物種名空間
    名空間
    ⾦銭価値空間
    (円)
    ⾦銭価値空間
    (ドル)
    コーヒー
    ¥290
    $2.53
    [緯度, 経度]
    Homo sapiens
    実存
    写像
    写像
    写像
    写像
    写像
    写像
    情報
    【写像】
    ある集合の要素を他の集合のただ1つの要素に対応づける規則

    View full-size slide

  15. ࣸ૾
    Ϧϯΰ
    ʢ࣮ଘʣ
    Ϧϯΰ
    ʢ৘ใʣ
    mapping

    View full-size slide

  16. ৘ใྔ
    ࣮ଘ
    ৘ใ
    σʔλ Ϧϯΰ
    ූ߸Խ

    View full-size slide

  17. ৘ใྔ
    ࣮ଘ
    ৘ใ
    σʔλ Ϧϯΰ
    ූ߸Խ
    ৘ใྔͷଛࣦ

    View full-size slide

  18. Experiment
    hypothesis observa=on
    principle phenotype
    model data
    Truth
    Knowledge f X
    (unknown)

    View full-size slide

  19. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View full-size slide

  20. data.frame
    *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View full-size slide

  21. vector
    in Excel

    View full-size slide

  22. vector
    in R
    in Excel
    pre <- c(1, 2, 3, 4, 5)
    post <- pre * 5
    > pre
    [1] 1 2 3 4 5
    > post
    [1] 5 10 15 20 25

    View full-size slide

  23. vector
    vec1 <- c(1, 2, 3, 4, 5)
    vec2 <- 1:5
    vec3 <- seq(from = 1, to = 5, by = 1)
    > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5
    > vec3
    [1] 1 2 3 4 5

    View full-size slide

  24. vector
    vec1 <- seq(from = 1, to = 5, by = 1)
    vec2 <- seq(1, 5, 1)
    > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5

    View full-size slide

  25. > ?seq
    vector
    seq{base}
    Sequence Generation
    Description
    Generate regular sequences. seq is a standard
    generic with a default method. …
    Usage
    seq(...)
    ## Default S3 method:
    seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
    length.out = NULL, along.with = NULL, ...)

    View full-size slide

  26. vector
    vec1 <- rep(1:3, times = 2)
    vec2 <- rep(1:3, each = 2)
    vec3 <- rep(1:3, times = 2, each = 2)
    > vec1
    [1] 1 2 3 1 2 3
    > vec2
    [1] 1 1 2 2 3 3
    > vec3
    [1] 1 1 2 2 3 3 1 1 2 2 3 3

    View full-size slide

  27. vector
    vec1 <- 11:15
    > vec1
    [1] 11 12 13 14 15
    > vec1[1]
    [1] 11
    > vec1[3:5]
    [1] 13 14 15
    > vec1[c(1:2, 5)]
    [1] 11 12 15

    View full-size slide

  28. list
    list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > list1
    [[1]]
    [1] 1 2 3 4 5 6
    [[2]]
    [1] 11 12 13 14 15
    [[3]]
    [1] "a" "b" "c"

    View full-size slide

  29. list
    list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > list1[[1]]
    [1] 1 2 3 4 5 6
    > list1[[3]][2:3]
    [1] "b" "c"
    > list1[[2]] * 3
    [1] 33 36 39 42 45

    View full-size slide

  30. named list
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))
    > list2
    $A
    [1] 1 2 3 4 5 6
    $B
    [1] 11 12 13 14 15
    $C
    [1] "a" "b" "c"

    View full-size slide

  31. > list2$A
    [1] 1 2 3 4 5 6
    > list2$C[2:3]
    [1] "b" "c"
    > list2$B * 3
    [1] 33 36 39 42 45
    named list
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))

    View full-size slide

  32. list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > class(list1)
    [1] "list"
    > names(list1)
    NULL
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))
    > class(list2)
    [1] "list"
    > names(list2)
    [1] "A" "B" "C"
    named list
    list

    View full-size slide

  33. list3 <- list(A = 1:3, B = 11:13)
    > class(list3)
    [1] "list"
    > names(list3)
    [1] "A" "B"
    df1 <- data.frame(A = 1:3, B = 11:13)
    > class(df1)
    [1] "data.frame"
    > names(df1)
    [1] "A" "B"
    named list & data.frame

    View full-size slide

  34. > str(list3)
    List of 2
    $ A: int [1:3] 1 2 3
    $ B: int [1:3] 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 variables:
    $ A: int 1 2 3
    $ B: int 11 12 13
    list3 <- list(A = 1:3, B = 11:13)
    df1 <- data.frame(A = 1:3, B = 11:13)
    named list & data.frame

    View full-size slide

  35. > list3
    $A
    [1] 1 2 3
    $B
    [1] 11 12 13
    > df1
    A B
    1 1 11
    2 2 12
    3 3 13
    named list & data.frame

    View full-size slide

  36. data.frame vs. matrix
    A B
    1 1 11
    2 2 12
    3 3 13
    [,1] [,2]
    [1,] 1 11
    [2,] 2 12
    [3,] 3 13
    df1 <- data.frame(A = 1:3, B = 11:13)
    > str(mat1)
    int [1:3, 1:2] 1 2 3 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 vars.:
    $ A: int 1 2 3
    $ B: int 11 12 13
    mat1 <- matrix(c(1:3, 11:13), nrow = 3, ncol = 2)

    View full-size slide

  37. data.frame
    variables
    observa*on

    View full-size slide

  38. data.frame
    *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View full-size slide

  39. > anscombe
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    7 6 6 6 8 7.24 6.13 6.08 5.25
    8 4 4 4 19 4.26 3.10 5.39 12.50
    9 12 12 12 8 10.84 9.13 8.15 5.56
    10 7 7 7 8 4.82 7.26 6.42 7.91
    11 5 5 5 8 5.68 4.74 5.73 6.89
    Wide form data

    View full-size slide

  40. > df
    tag x1 x2 x3 x4 y1 y2 y3 y4
    1 1 10 10 10 8 8.04 9.14 7.46 6.58
    2 2 8 8 8 8 6.95 8.14 6.77 5.76
    3 3 13 13 13 8 7.58 8.74 12.74 7.71
    4 4 9 9 9 8 8.81 8.77 7.11 8.84
    5 5 11 11 11 8 8.33 9.26 7.81 8.47
    6 6 14 14 14 8 9.96 8.10 8.84 7.04
    Wide form data
    df <-
    rownames_to_column(
    anscombe,
    var = "tag"
    )

    View full-size slide

  41. Wide form → Long form data
    df_long_1 <-
    pivot_longer(
    data = df,
    cols = !tag
    )
    df_long_2 <-
    pivot_longer(
    data = df,
    cols = !tag,
    names_to = c(".value", "key"),
    names_pattern = c("(.)(.)")
    )

    View full-size slide

  42. Long form → Wide form data
    pivot_wider(
    data = df_long_1,
    values_from = value,
    names_from = name
    )
    pivot_wider(
    data = df_long_2,
    values_from = c(x, y),
    names_from = tag
    )

    View full-size slide

  43. data.frame / *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing

    View full-size slide

  44. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    data.frame / *bble

    View full-size slide

  45. vignette("dplyr")

    View full-size slide

  46. It (dplyr) provides simple “verbs” to help
    you translate your thoughts into code.
    func?ons that correspond to the most
    common data manipula?on tasks
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}

    View full-size slide

  47. dplyrは、あなたの考えをコードに翻訳
    するための【動詞】を提供する。
    データ操作における基本のキを、
    シンプルに実⾏できる関数 (群)
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}
    ※ かなり意訳

    View full-size slide

  48. (SBNNBSPGEBUBNBOJQVMBUJPO
    By constraining your op@ons,
    it helps you think about your data
    manipula@on challenges.
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html

    View full-size slide

  49. 選択肢を制限することで、
    データ解析のステップを
    シンプルに考えられますヨ。
    (めっちゃ意訳)
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html
    ※ まさに意訳
    (SBNNBSPGEBUBNBOJQVMBUJPO

    View full-size slide

  50. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula@on

    View full-size slide

  51. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula@on
    0. %>%

    View full-size slide

  52. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magri7r}
    「dplyr再⼊⾨(基本編)」yutanihila@on
    h"ps://speakerdeck.com/yutannihila6on/dplyrzai-ru-men-ji-ben-bian

    View full-size slide





  53. lift
    take
    pour
    put
    Bring milk from the kitchen!

    View full-size slide


  54. lift
    Bring milk from the kitchen!
    lift(Robot, glass, table) -> Robot'
    take

    take(Robot', fridge, milk) -> Robot''

    View full-size slide

  55. Bring milk from the kitchen!
    Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View full-size slide

  56. The =dyverse style guides
    h"ps://style.;dyverse.org/syntax.html#object-names
    "There are only two hard things in Computer Science:
    cache invalida:on and naming things"

    View full-size slide

  57. Bring milk from the kitchen!
    Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View full-size slide

  58. Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④
    Thinking Reading
    Bring milk from the kitchen!

    View full-size slide

  59. Programing
    Write
    Run
    Read
    Think
    Write
    Run
    Read
    Think
    Communicate
    Share

    View full-size slide

  60. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magri7r}
    「dplyr再⼊⾨(基本編)」yutanihila@on
    h"ps://speakerdeck.com/yutannihila6on/dplyrzai-ru-men-ji-ben-bian

    View full-size slide

  61. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula@on
    0. %>%

    View full-size slide

  62. WFSCT {dplyr}
    mutate # カラムの追加
    +
    mutate(dat, C = fun(A, B))

    View full-size slide

  63. WFSCT {dplyr}
    mutate # カラムの追加
    +
    dat %>% mutate(C = fun(A, B))

    View full-size slide

  64. WFSCT {dplyr}
    filter # 行の絞り込み
    dat %>% filter(tag %in% c(1, 3, 5))

    View full-size slide

  65. ブール演算⼦ Boolean Algebra
    A == B A != B
    George Boole
    1815 - 1864
    A | B A & B
    A %in% B
    # equal to # not equal to
    # or # and
    # is A in B?
    wikipedia

    View full-size slide

  66. "a" != "b"
    # is A in B?
    ブール演算⼦ Boolean Algebra
    [1] TRUE
    1 %in% 10:100
    # is A in B?
    [1] FALSE

    View full-size slide

  67. George Boole
    1815 - 1864
    A Class-Room Introduc;on to Logic
    h"ps://niyamaklogic.wordpress.com/c
    ategory/laws-of-thoughts/
    Mathema=cian
    Philosopher
    &

    View full-size slide

  68. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select(tag, B)

    View full-size slide

  69. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")

    View full-size slide

  70. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")
    dat %>% select(tag, B)

    View full-size slide

  71. WFSCT {dplyr}
    # Select help func?ons
    starts_with("s") ends_with("s")
    contains("se") matches("^.e")
    one_of(c(”tag", ”B"))
    everything()
    hFps://kazutan.github.io/blog/2017/04/dplyr-select-memo/
    「dplyr::selectの活⽤例メモ」kazutan

    View full-size slide

  72. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula@on
    0. %>%




    View full-size slide

  73. 選択肢を制限することで、
    データ解析のステップを
    シンプルに考えられますヨ。
    (めっちゃ意訳)
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html
    ※ まさに意訳
    (SBNNBSPGEBUBNBOJQVMBUJPO

    View full-size slide

  74. より多くの制約を課す事で、
    魂の⾜枷から、より⾃由になる。
    Igor Stravinsky
    И@горь Ф Страви́нский
    The more constraints one imposes,
    the more one frees one's self of the
    chains that shackle the spirit.
    1882 - 1971
    ※ 割と意訳

    View full-size slide

  75. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View full-size slide

  76. Programing
    Write
    Run
    Read
    Think
    Write
    Run
    Read
    Think
    Communicate
    Share

    View full-size slide

  77. Text Image
    Information
    Intention
    Data
    decode
    encode
    Data analysis
    feedback

    View full-size slide

  78. Text
    Image
    First, A. Next, B.
    Then C. Finally D.
    ?me
    Intention
    encode
    "Frozen" structure
    A B C D Xme
    value
    α
    β

    View full-size slide

  79. ࣸ૾
    Ϧϯΰ
    ʢ࣮ଘʣ
    Ϧϯΰ
    ʢ৘ใʣ
    mapping

    View full-size slide

  80. Ϧϯΰ
    ࣸ૾
    ϑϧʔπ
    ੺৭

    ը૾

    ࣮ଘ ৘ใ
    νϟωϧ
    mapping
    channel

    View full-size slide

  81. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping

    View full-size slide

  82. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    ৹ඒతνϟωϧ

    View full-size slide

  83. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    ৹ඒతνϟωϧ
    ggplot(data = my_data) +
    aes(x = X, y = Y)) +
    goem_point()
    HHQMPUʹΑΔ࡞ਤ

    View full-size slide

  84. ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    # $
    &!
    &"
    %!
    %"
    EBUB
    mapping
    aesthetic channels
    ৹ඒతνϟωϧ
    σʔλՄࢹԽ

    View full-size slide

  85. ॳΊͯͷHHQMPU
    library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y))

    View full-size slide

  86. ॳΊͯͷHHQMPU

    View full-size slide

  87. ॳΊͯͷHHQMPU
    library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y))
    EBUBGSBNFͷࢦఆ
    BFT
    ؔ਺ͷதͰ৹ඒతཁૉͱͯ͠ม਺ͱνϟωϧͷରԠΛࢦఆ
    ඳը։࢝Λએݴ ه߸Ͱͭͳ͙
    BFT
    ؔ਺ͷҾ਺໊
    EBUͷม਺໊
    άϥϑͷछྨʹ߹ΘͤͨHFPN@
    ؔ਺Λ࢖༻

    View full-size slide

  88. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ॳΊ͔ͯΒ൪໨ͷHHQMPU

    View full-size slide

  89. ॳΊ͔ͯΒ൪໨ͷHHQMPU

    View full-size slide

  90. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ggplot(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_point() +
    geom_path()
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_point() +
    geom_path()
    ڞ௨ͷࢦఆΛHHQMPU
    ؔ਺ͷதͰߦ͍ɺҎԼলུ͢Δ͜ͱ͕Մೳ
    NBQQJOHͷ৘ใ͕ॻ͔ΕͨBFT
    ؔ਺ΛHHQMPU
    ؔ਺ͷ֎ʹஔ͘͜ͱ΋Ͱ͖Δ

    View full-size slide

  91. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y, color = tag)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ggplot(data = dat) +
    aes(x = X, y = Y) + # 括り出すのは共通するものだけ
    geom_point(mapping = aes(color = tag)) +
    geom_path()
    ϙΠϯτͷ৭ͷNBQQJOHΛࢦఆ

    View full-size slide

  92. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_point(aes(color = tag)) +
    geom_path()
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(aes(color = tag))
    ͋ͱ͔ΒͰॏͶͨཁૉ͕લ໘ʹඳը͞ΕΔ

    View full-size slide

  93. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 4, height = 3, dpi = 150)

    View full-size slide

  94. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 4, height = 3, dpi = 150)
    αΠζ͸σϑΥϧτͰ͸Πϯν୯ҐͰࢦఆ

    View full-size slide

  95. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 10, height = 7.5, dpi = 150,
    units = "cm") # "cm", "mm", "in"を指定可能

    View full-size slide

  96. HFNP@
    ؔ਺܈ DGIUUQTXXXSTUVEJPDPNSFTPVSDFTDIFBUTIFFUT

    View full-size slide

  97. ෳ਺ͷܥྻΛඳը͢Δ
    > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    ggplot(data = anscombe) +
    geom_point(aes(x = x1, y = y1)) +
    geom_point(aes(x = x2, y = y2), color = "Red") +
    geom_point(aes(x = x3, y = y3), color = "Blue") +
    geom_point(aes(x = x4, y = y4), color = "Green")
    ͜Ε·Ͱͷ஌ࣝͰؤுΔͱ͜͏ͳΔ

    View full-size slide

  98. HHQMPUʹΑΔσʔλՄࢹԽ
    ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    SBXEBUB
    写像
    aesthetic channels
    ৹ඒతνϟωϧ
    ՄࢹԽʹదͨ͠EBUBܗࣜ
    変形
    ਤͷͭͷ৹ඒతνϟωϧ͕
    σʔλͷͭͷม਺ʹରԠ͍ͯ͠Δ

    View full-size slide

  99. > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    > head(anscombe_long)
    key x y
    1 1 10 8.04
    2 2 10 9.14
    3 3 10 7.46
    4 4 8 6.58
    5 1 8 6.95
    6 2 8 8.14
    ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point()
    ৹ඒతνϟωϧ Y࣠ Z࣠ ৭
    ʹରԠ͢Δม਺ʹͳΔΑ͏มܗ
    ݟ௨͠ྑ͘γϯϓϧʹՄࢹԽͰ͖Δ

    View full-size slide

  100. > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    > head(anscombe_long)
    key x y
    1 1 10 8.04
    2 2 10 9.14
    3 3 10 7.46
    4 4 8 6.58
    5 1 8 6.95
    6 2 8 8.14
    ৹ඒతνϟωϧ Y࣠ Z࣠ ৭
    ʹରԠ͢Δม਺ʹͳΔΑ͏มܗ
    anscombe_long <-
    pivot_longer(data = anscombe,
    cols = everything(),
    names_to = c(".value",
    "key"),
    names_pattern = "(.)(.)")
    ԣ௕σʔλ
    ॎ௕σʔλ

    View full-size slide

  101. ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point()
    ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point() +
    facet_wrap(facets = . ~ key, nrow = 1)
    ਫ४ͰਤΛ෼ׂ͢Δ

    View full-size slide

  102. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View full-size slide

  103. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    preprocessing
    Data science
    Data
    Observa=on Hypothesis
    NarraFve of data
    feedback
    Data processing

    View full-size slide

  104. data.frame / *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing

    View full-size slide

  105. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    data.frame / *bble

    View full-size slide

  106. It (dplyr) provides simple “verbs” to help
    you translate your thoughts into code.
    func?ons that correspond to the most
    common data manipula?on tasks
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}

    View full-size slide

  107. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame manipula@on

    View full-size slide

  108. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View full-size slide

  109. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    data
    ggplot2 package

    View full-size slide

  110. HHQMPUʹΑΔσʔλՄࢹԽ
    ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    SBXEBUB
    写像
    aesthetic channels
    ৹ඒతνϟωϧ
    ՄࢹԽʹదͨ͠EBUBܗࣜ
    変形
    ਤͷͭͷ৹ඒతνϟωϧ͕
    σʔλͷͭͷม਺ʹରԠ͍ͯ͠Δ

    View full-size slide