Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SappoRo.R_roundrobin

 SappoRo.R_roundrobin

第10回Sapporo.Rで喋った際のスライドです。

kilometer

March 18, 2023
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. SappoRo.R #10
    @kilometer00
    2023.03.18
    らくらく総当たり組み合わせ

    View full-size slide

  2. Who!?
    Who?

    View full-size slide

  3. Who!?
    名前: 三村 @kilometer
    職業: ポスドク (こうがくはくし)
    専⾨: ⾏動神経科学(霊⻑類)
    脳イメージング
    医療システム⼯学
    R歴: ~ 10年ぐらい
    流⾏: アンキロサウルス

    View full-size slide

  4. 宣伝!!(書籍の翻訳に参加しました。)
    絶賛販売中!

    View full-size slide

  5. 宣伝2!!
    R⾔語の地域コミュニティ@東京です。
    定期的にR⾔語に関する勉強会を開催しています。
    次回は4⽉22⽇!!
    初⼼者特集回です!!

    View full-size slide

  6. 総当たり組み合わせ
    Round-robin
    そう あ あ

    View full-size slide

  7. dat_nest <-
    palmerpenguins::penguins %>%
    dplyr::group_nest(species)
    データを畳み込む
    > dat_nest
    # A tibble: 3 × 2
    species data
    >
    1 Adelie [152 × 7]
    2 Chinstrap [68 × 7]
    3 Gentoo [124 × 7]
    (息を吐くように)

    View full-size slide

  8. # A tibble: 9 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Adelie [152 × 7] [152 × 7]
    2 Chinstrap Adelie [68 × 7] [152 × 7]
    3 Gentoo Adelie [124 × 7] [152 × 7]
    4 Adelie Chinstrap [152 × 7] [68 × 7]
    5 Chinstrap Chinstrap [68 × 7] [68 × 7]
    6 Gentoo Chinstrap [124 × 7] [68 × 7]
    7 Adelie Gentoo [152 × 7] [124 × 7]
    8 Chinstrap Gentoo [68 × 7] [124 × 7]
    9 Gentoo Gentoo [124 × 7] [124 × 7]
    総当たり組み合わせ
    # A tibble: 3 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Chinstrap [152 × 7] [68 × 7]
    2 Adelie Gentoo [152 × 7] [124 × 7]
    3 Chinstrap Gentoo [68 × 7] [124 × 7]
    組み合わせ(combination)
    (round-robin)

    View full-size slide

  9. base::expand.grid()関数
    > dat_nest
    # A tibble: 3 × 2
    species data
    >
    1 Adelie [152 × 7]
    2 Chinstrap [68 × 7]
    3 Gentoo [124 × 7]
    dat_nest$species

    View full-size slide

  10. grid <-
    dat_nest$species %>%
    expand.grid(., .)
    base::expand.grid()関数
    > grid
    Var1 Var2
    1 Adelie Adelie
    2 Chinstrap Adelie
    3 Gentoo Adelie
    4 Adelie Chinstrap
    5 Chinstrap Chinstrap
    6 Gentoo Chinstrap
    7 Adelie Gentoo
    8 Chinstrap Gentoo
    9 Gentoo Gentoo

    View full-size slide

  11. dplyr::left_join()関数
    > grid
    Var1 Var2
    1 Adelie Adelie
    2 Chinstrap Adelie
    3 Gentoo Adelie
    4 Adelie Chinstrap
    5 Chinstrap Chinstrap
    6 Gentoo Chinstrap
    7 Adelie Gentoo
    8 Chinstrap Gentoo
    9 Gentoo Gentoo
    > dat_nest
    # A tibble: 3 × 2
    species data
    >
    1 Adelie [152 × 7]
    2 Chinstrap [68 × 7]
    3 Gentoo [124 × 7]
    ①対応づけて結合
    ②対応づけて結合

    View full-size slide

  12. dplyr::left_join()関数
    dat_rr <-
    grid %>%
    tibble::as_tibble() %>%
    dplyr::left_join(
    dat_nest %>%
    dplyr::rename(Var1 = "species"),
    by = "Var1"
    ) %>%
    dplyr::left_join(
    dat_nest %>%
    dplyr::rename(Var2 = "species"),
    by = "Var2"
    )

    View full-size slide

  13. dplyr::left_join()関数
    dat_rr <-
    grid %>%
    tibble::as_tibble() %>%
    dplyr::left_join(
    dat_nest %>%
    dplyr::rename(Var1 = "species"),
    by = "Var1"
    ) %>%
    dplyr::left_join(
    dat_nest %>%
    dplyr::rename(Var2 = "species"),
    by = "Var2"
    )


    View full-size slide

  14. > dat_rr
    # A tibble: 9 × 4
    Var1 Var2 data.x data.y
    >
    1 Adelie Adelie [152 × 7] [152 × 7]
    2 Chinstrap Adelie [68 × 7] [152 × 7]
    3 Gentoo Adelie [124 × 7] [152 × 7]
    4 Adelie Chinstrap [152 × 7] [68 × 7]
    5 Chinstrap Chinstrap [68 × 7] [68 × 7]
    6 Gentoo Chinstrap [124 × 7] [68 × 7]
    7 Adelie Gentoo [152 × 7] [124 × 7]
    8 Chinstrap Gentoo [68 × 7] [124 × 7]
    9 Gentoo Gentoo [124 × 7] [124 × 7]

    View full-size slide

  15. dplyr::rename()関数
    dat_rr_rename <-
    dat_rr %>%
    rename(species.x = Var1) %>%
    rename(species.y = Var2)
    > dat_rr_rename
    # A tibble: 9 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Adelie [152 × 7] [152 × 7]
    2 Chinstrap Adelie [68 × 7] [152 × 7]
    3 Gentoo Adelie [124 × 7] [152 × 7]
    4 Adelie Chinstrap [152 × 7] [68 × 7]
    5 Chinstrap Chinstrap [68 × 7] [68 × 7]
    6 Gentoo Chinstrap [124 × 7] [68 × 7]
    7 Adelie Gentoo [152 × 7] [124 × 7]
    8 Chinstrap Gentoo [68 × 7] [124 × 7]
    9 Gentoo Gentoo [124 × 7] [124 × 7]

    View full-size slide

  16. dplyr::rename()関数
    dat_rr_rename <-
    dat_rr %>%
    rename(species.x = Var1) %>%
    rename(species.y = Var2)
    key <- "species"
    x <- stringr::str_c(key, ".x")
    y <- stringr::str_c(key, ".y")
    dat_rr_rename <-
    dat_rr %>%
    rename(!!x := Var1) %>%
    rename(!!y := Var2)
    別解
    {rlang}パッケージの演算⼦

    View full-size slide

  17. # A tibble: 9 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Adelie [152 × 7] [152 × 7]
    2 Chinstrap Adelie [68 × 7] [152 × 7]
    3 Gentoo Adelie [124 × 7] [152 × 7]
    4 Adelie Chinstrap [152 × 7] [68 × 7]
    5 Chinstrap Chinstrap [68 × 7] [68 × 7]
    6 Gentoo Chinstrap [124 × 7] [68 × 7]
    7 Adelie Gentoo [152 × 7] [124 × 7]
    8 Chinstrap Gentoo [68 × 7] [124 × 7]
    9 Gentoo Gentoo [124 × 7] [124 × 7]
    総当たり組み合わせ
    # A tibble: 3 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Chinstrap [152 × 7] [68 × 7]
    2 Adelie Gentoo [152 × 7] [124 × 7]
    3 Chinstrap Gentoo [68 × 7] [124 × 7]
    組み合わせ(combination)
    (round-robin)

    View full-size slide

  18. grid <-
    dat_nest$species %>%
    expand.grid(., .) %>%
    subset(unclass(Var1) < unclass(Var2)) %>%
    tibble::as_tibble()
    base::subset()関数
    > grid
    # A tibble: 3 × 2
    Var1 Var2

    1 Adelie Chinstrap
    2 Adelie Gentoo
    3 Chinstrap Gentoo

    View full-size slide

  19. という変換を
    パッケージにしました。
    devtools::install_github(
    "kilometer0101/roundrobin"
    )
    (4回⼿打ちしたら⾯倒臭くなったので)

    View full-size slide

  20. roundrobin::roundrobin()関数
    # A tibble: 9 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Adelie [152 × 7] [152 × 7]
    2 Chinstrap Adelie [68 × 7] [152 × 7]
    3 Gentoo Adelie [124 × 7] [152 × 7]
    4 Adelie Chinstrap [152 × 7] [68 × 7]
    5 Chinstrap Chinstrap [68 × 7] [68 × 7]
    6 Gentoo Chinstrap [124 × 7] [68 × 7]
    7 Adelie Gentoo [152 × 7] [124 × 7]
    8 Chinstrap Gentoo [68 × 7] [124 × 7]
    9 Gentoo Gentoo [124 × 7] [124 × 7]
    library(roundrobin)
    palmerpenguins::penguins %>%
    roundrobin(key = "species")

    View full-size slide

  21. library(roundrobin)
    palmerpenguins::penguins %>%
    roundrobin(key = "species",
    combination = TRUE)
    roundrobin::roundrobin()関数
    # A tibble: 3 × 4
    species.x species.y data.x data.y
    > >
    1 Adelie Chinstrap [152 × 7] [68 × 7]
    2 Adelie Gentoo [152 × 7] [124 × 7]
    3 Chinstrap Gentoo [68 × 7] [124 × 7]

    View full-size slide

  22. 使ってみますか。

    View full-size slide

  23. library(tidyverse)
    library(palmerpenguins)
    library(roundrobin)
    dat <-
    palmerpenguins::penguins %>%
    na.omit() %>% # NA除去
    mutate_at(
    vars(c(contains("mm"), contains("g"))),
    ~ (. - mean(.)) / sd(.) # 標準化
    ) %>%
    select(species, contains("mm"), contains("g"))
    前処理

    View full-size slide

  24. > dat
    # A tibble: 333 × 5
    species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g

    1 Adelie -0.895 0.780 -1.42 -0.568
    2 Adelie -0.822 0.119 -1.07 -0.506
    3 Adelie -0.675 0.424 -0.426 -1.19
    4 Adelie -1.33 1.08 -0.568 -0.940
    5 Adelie -0.858 1.74 -0.782 -0.692
    6 Adelie -0.931 0.323 -1.42 -0.723
    7 Adelie -0.876 1.24 -0.426 0.581
    8 Adelie -0.529 0.221 -1.35 -1.25
    9 Adelie -0.986 2.05 -0.711 -0.506
    10 Adelie -1.72 2.00 -0.212 0.240
    # … with 323 more rows
    # i Use `print(n = ...)` to see more rows
    前処理

    View full-size slide

  25. dat_long <-
    dat %>%
    rowid_to_column("id") %>%
    pivot_longer(
    cols = !species,
    names_to = "parameter",
    values_to = "value"
    ) %>%
    group_by(parameter) %>%
    ungroup()
    .y <-
    dat_long %>%
    ungroup() %>%
    group_by(species) %>%
    summarise(
    mean_id = mean(id),
    min_id = min(id)
    )
    dat_long %>%
    ggplot() +
    aes(parameter, id) +
    geom_tile(aes(fill = value)) +
    geom_hline(
    yintercept = max(dat_long$id)
    ) +
    geom_hline(data = .y,
    aes(yintercept = min_id)) +
    scale_y_continuous(
    breaks = .y$mean_id,
    labels = .y$species,
    expand = c(0, 0)) +
    theme(
    axis.title = element_blank(),
    axis.text.x = element_text(
    angle = 30, hjust = 1
    )
    )
    可視化コード
    (ちょちょいのちょい)

    View full-size slide

  26. > dat_rr
    # A tibble: 9 × 4
    Var1 Var2 data.x data.y
    > >
    1 Adelie Adelie [146 × 4] [146 × 4]
    2 Chinstrap Adelie [68 × 4] [146 × 4]
    3 Gentoo Adelie [119 × 4] [146 × 4]
    4 Adelie Chinstrap [146 × 4] [68 × 4]
    5 Chinstrap Chinstrap [68 × 4] [68 × 4]
    6 Gentoo Chinstrap [119 × 4] [68 × 4]
    7 Adelie Gentoo [146 × 4] [119 × 4]
    8 Chinstrap Gentoo [68 × 4] [119 × 4]
    9 Gentoo Gentoo [119 × 4] [119 × 4]
    総当たり組み合わせ
    dat_rr <-
    dat %>%
    roundrobin(key = "species",
    rename = FALSE)

    View full-size slide

  27. 例えばマハラノビス距離
    dat_rr_mahaD <-
    dat_rr %>%
    mutate(mahaD2 = map2(
    data.x, data.y, # yに対するxの距離
    ~ mahalanobis(.x, colMeans(.y), cov(.y))
    )) %>%
    mutate(Var2 = str_c("vs. ", Var2))
    > dat_rr_mahaD
    # A tibble: 9 × 5
    Var1 Var2 data.x data.y mahaD2
    > >
    1 Adelie vs. Adelie [146 × 4] [146 × 4]
    2 Chinstrap vs. Adelie [68 × 4] [146 × 4]
    3 Gentoo vs. Adelie [119 × 4] [146 × 4]
    4 Adelie vs. Chinstrap [146 × 4] [68 × 4]
    5 Chinstrap vs. Chinstrap [68 × 4] [68 × 4]
    6 Gentoo vs. Chinstrap [119 × 4] [68 × 4]
    7 Adelie vs. Gentoo [146 × 4] [119 × 4]
    8 Chinstrap vs. Gentoo [68 × 4] [119 × 4]
    9 Gentoo vs. Gentoo [119 × 4] [119 × 4]

    View full-size slide

  28. 例えばマハラノビス距離
    dat_rr_mahaD <-
    dat_rr %>%
    mutate(mahaD2 = map2(
    data.x, data.y, # yに対するxの距離
    ~ mahalanobis(.x, colMeans(.y), cov(.y))
    )) %>%
    mutate(Var2 = str_c("vs. ", Var2))
    dat_mahaD <-
    dat_rr_mahaD %>%
    select(!data.y) %>%
    unnest(everything())

    View full-size slide

  29. > dat_mahaD
    # A tibble: 999 × 7
    Var1 Var2 bill_length_mm bill_…¹ flipp…² body_…³ mahaD2

    1 Adelie vs. Adelie -0.895 0.780 -1.42 -0.568 2.84
    2 Adelie vs. Adelie -0.822 0.119 -1.07 -0.506 1.95
    3 Adelie vs. Adelie -0.675 0.424 -0.426 -1.19 4.26
    4 Adelie vs. Adelie -1.33 1.08 -0.568 -0.940 3.32
    5 Adelie vs. Adelie -0.858 1.74 -0.782 -0.692 5.57
    6 Adelie vs. Adelie -0.931 0.323 -1.42 -0.723 2.47
    7 Adelie vs. Adelie -0.876 1.24 -0.426 0.581 5.94
    8 Adelie vs. Adelie -0.529 0.221 -1.35 -1.25 5.27
    9 Adelie vs. Adelie -0.986 2.05 -0.711 -0.506 7.75
    10 Adelie vs. Adelie -1.72 2.00 -0.212 0.240 15.2
    # … with 989 more rows, and abbreviated variable names
    # ¹bill_depth_mm, ²flipper_length_mm, ³body_mass_g
    # ℹ Use `print(n = ...)` to see more rows
    例えばマハラノビス距離

    View full-size slide

  30. 例えばマハラノビス距離
    ggplot(dat_mahaD) +
    aes(mahaD2, color = Var1, fill = Var1) +
    geom_density(alpha = 0.5) +
    facet_wrap(~Var2)

    View full-size slide

  31. 総当たり組み合わせ
    Round-robin
    そう あ あ

    devtools::install_github(
    "kilometer0101/roundrobin"
    )

    View full-size slide