第10回Sapporo.Rで喋った際のスライドです。
SappoRo.R #10@kilometer002023.03.18らくらく総当たり組み合わせ
View Slide
Who!?Who?
Who!?名前: 三村 @kilometer職業: ポスドク (こうがくはくし)専⾨: ⾏動神経科学(霊⻑類)脳イメージング医療システム⼯学R歴: ~ 10年ぐらい流⾏: アンキロサウルス
宣伝!!(書籍の翻訳に参加しました。)絶賛販売中!
宣伝2!!R⾔語の地域コミュニティ@東京です。定期的にR⾔語に関する勉強会を開催しています。次回は4⽉22⽇!!初⼼者特集回です!!
総当たり組み合わせRound-robinそう あ あく
dat_nest palmerpenguins::penguins %>%dplyr::group_nest(species)データを畳み込む> dat_nest# A tibble: 3 × 2species data >1 Adelie [152 × 7]2 Chinstrap [68 × 7]3 Gentoo [124 × 7](息を吐くように)
# A tibble: 9 × 4species.x species.y data.x data.y > >1 Adelie Adelie [152 × 7] [152 × 7]2 Chinstrap Adelie [68 × 7] [152 × 7]3 Gentoo Adelie [124 × 7] [152 × 7]4 Adelie Chinstrap [152 × 7] [68 × 7]5 Chinstrap Chinstrap [68 × 7] [68 × 7]6 Gentoo Chinstrap [124 × 7] [68 × 7]7 Adelie Gentoo [152 × 7] [124 × 7]8 Chinstrap Gentoo [68 × 7] [124 × 7]9 Gentoo Gentoo [124 × 7] [124 × 7]総当たり組み合わせ# A tibble: 3 × 4species.x species.y data.x data.y > >1 Adelie Chinstrap [152 × 7] [68 × 7]2 Adelie Gentoo [152 × 7] [124 × 7]3 Chinstrap Gentoo [68 × 7] [124 × 7]組み合わせ(combination)(round-robin)
base::expand.grid()関数> dat_nest# A tibble: 3 × 2species data >1 Adelie [152 × 7]2 Chinstrap [68 × 7]3 Gentoo [124 × 7]dat_nest$species
grid dat_nest$species %>%expand.grid(., .)base::expand.grid()関数> gridVar1 Var21 Adelie Adelie2 Chinstrap Adelie3 Gentoo Adelie4 Adelie Chinstrap5 Chinstrap Chinstrap6 Gentoo Chinstrap7 Adelie Gentoo8 Chinstrap Gentoo9 Gentoo Gentoo
dplyr::left_join()関数> gridVar1 Var21 Adelie Adelie2 Chinstrap Adelie3 Gentoo Adelie4 Adelie Chinstrap5 Chinstrap Chinstrap6 Gentoo Chinstrap7 Adelie Gentoo8 Chinstrap Gentoo9 Gentoo Gentoo> dat_nest# A tibble: 3 × 2species data >1 Adelie [152 × 7]2 Chinstrap [68 × 7]3 Gentoo [124 × 7]①対応づけて結合②対応づけて結合
dplyr::left_join()関数dat_rr grid %>%tibble::as_tibble() %>%dplyr::left_join(dat_nest %>%dplyr::rename(Var1 = "species"),by = "Var1") %>%dplyr::left_join(dat_nest %>%dplyr::rename(Var2 = "species"),by = "Var2")
dplyr::left_join()関数dat_rr grid %>%tibble::as_tibble() %>%dplyr::left_join(dat_nest %>%dplyr::rename(Var1 = "species"),by = "Var1") %>%dplyr::left_join(dat_nest %>%dplyr::rename(Var2 = "species"),by = "Var2")①②
> dat_rr# A tibble: 9 × 4Var1 Var2 data.x data.y > 1 Adelie Adelie [152 × 7] [152 × 7]2 Chinstrap Adelie [68 × 7] [152 × 7]3 Gentoo Adelie [124 × 7] [152 × 7]4 Adelie Chinstrap [152 × 7] [68 × 7]5 Chinstrap Chinstrap [68 × 7] [68 × 7]6 Gentoo Chinstrap [124 × 7] [68 × 7]7 Adelie Gentoo [152 × 7] [124 × 7]8 Chinstrap Gentoo [68 × 7] [124 × 7]9 Gentoo Gentoo [124 × 7] [124 × 7]
dplyr::rename()関数dat_rr_rename dat_rr %>%rename(species.x = Var1) %>%rename(species.y = Var2)> dat_rr_rename# A tibble: 9 × 4species.x species.y data.x data.y > >1 Adelie Adelie [152 × 7] [152 × 7]2 Chinstrap Adelie [68 × 7] [152 × 7]3 Gentoo Adelie [124 × 7] [152 × 7]4 Adelie Chinstrap [152 × 7] [68 × 7]5 Chinstrap Chinstrap [68 × 7] [68 × 7]6 Gentoo Chinstrap [124 × 7] [68 × 7]7 Adelie Gentoo [152 × 7] [124 × 7]8 Chinstrap Gentoo [68 × 7] [124 × 7]9 Gentoo Gentoo [124 × 7] [124 × 7]
dplyr::rename()関数dat_rr_rename dat_rr %>%rename(species.x = Var1) %>%rename(species.y = Var2)key x y dat_rr_rename dat_rr %>%rename(!!x := Var1) %>%rename(!!y := Var2)別解{rlang}パッケージの演算⼦
grid dat_nest$species %>%expand.grid(., .) %>%subset(unclass(Var1) < unclass(Var2)) %>%tibble::as_tibble()base::subset()関数> grid# A tibble: 3 × 2Var1 Var2 1 Adelie Chinstrap2 Adelie Gentoo3 Chinstrap Gentoo
という変換をパッケージにしました。devtools::install_github("kilometer0101/roundrobin")(4回⼿打ちしたら⾯倒臭くなったので)
roundrobin::roundrobin()関数# A tibble: 9 × 4species.x species.y data.x data.y > >1 Adelie Adelie [152 × 7] [152 × 7]2 Chinstrap Adelie [68 × 7] [152 × 7]3 Gentoo Adelie [124 × 7] [152 × 7]4 Adelie Chinstrap [152 × 7] [68 × 7]5 Chinstrap Chinstrap [68 × 7] [68 × 7]6 Gentoo Chinstrap [124 × 7] [68 × 7]7 Adelie Gentoo [152 × 7] [124 × 7]8 Chinstrap Gentoo [68 × 7] [124 × 7]9 Gentoo Gentoo [124 × 7] [124 × 7]library(roundrobin)palmerpenguins::penguins %>%roundrobin(key = "species")
library(roundrobin)palmerpenguins::penguins %>%roundrobin(key = "species",combination = TRUE)roundrobin::roundrobin()関数# A tibble: 3 × 4species.x species.y data.x data.y > >1 Adelie Chinstrap [152 × 7] [68 × 7]2 Adelie Gentoo [152 × 7] [124 × 7]3 Chinstrap Gentoo [68 × 7] [124 × 7]
使ってみますか。
library(tidyverse)library(palmerpenguins)library(roundrobin)dat palmerpenguins::penguins %>%na.omit() %>% # NA除去mutate_at(vars(c(contains("mm"), contains("g"))),~ (. - mean(.)) / sd(.) # 標準化) %>%select(species, contains("mm"), contains("g"))前処理
> dat# A tibble: 333 × 5species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g 1 Adelie -0.895 0.780 -1.42 -0.5682 Adelie -0.822 0.119 -1.07 -0.5063 Adelie -0.675 0.424 -0.426 -1.194 Adelie -1.33 1.08 -0.568 -0.9405 Adelie -0.858 1.74 -0.782 -0.6926 Adelie -0.931 0.323 -1.42 -0.7237 Adelie -0.876 1.24 -0.426 0.5818 Adelie -0.529 0.221 -1.35 -1.259 Adelie -0.986 2.05 -0.711 -0.50610 Adelie -1.72 2.00 -0.212 0.240# … with 323 more rows# i Use `print(n = ...)` to see more rows前処理
dat_long dat %>%rowid_to_column("id") %>%pivot_longer(cols = !species,names_to = "parameter",values_to = "value") %>%group_by(parameter) %>%ungroup().y dat_long %>%ungroup() %>%group_by(species) %>%summarise(mean_id = mean(id),min_id = min(id))dat_long %>%ggplot() +aes(parameter, id) +geom_tile(aes(fill = value)) +geom_hline(yintercept = max(dat_long$id)) +geom_hline(data = .y,aes(yintercept = min_id)) +scale_y_continuous(breaks = .y$mean_id,labels = .y$species,expand = c(0, 0)) +theme(axis.title = element_blank(),axis.text.x = element_text(angle = 30, hjust = 1))可視化コード(ちょちょいのちょい)
可視化
> dat_rr# A tibble: 9 × 4Var1 Var2 data.x data.y > >1 Adelie Adelie [146 × 4] [146 × 4]2 Chinstrap Adelie [68 × 4] [146 × 4]3 Gentoo Adelie [119 × 4] [146 × 4]4 Adelie Chinstrap [146 × 4] [68 × 4]5 Chinstrap Chinstrap [68 × 4] [68 × 4]6 Gentoo Chinstrap [119 × 4] [68 × 4]7 Adelie Gentoo [146 × 4] [119 × 4]8 Chinstrap Gentoo [68 × 4] [119 × 4]9 Gentoo Gentoo [119 × 4] [119 × 4]総当たり組み合わせdat_rr dat %>%roundrobin(key = "species",rename = FALSE)
例えばマハラノビス距離dat_rr_mahaD dat_rr %>%mutate(mahaD2 = map2(data.x, data.y, # yに対するxの距離~ mahalanobis(.x, colMeans(.y), cov(.y)))) %>%mutate(Var2 = str_c("vs. ", Var2))> dat_rr_mahaD# A tibble: 9 × 5Var1 Var2 data.x data.y mahaD2 > > 1 Adelie vs. Adelie [146 × 4] [146 × 4] 2 Chinstrap vs. Adelie [68 × 4] [146 × 4] 3 Gentoo vs. Adelie [119 × 4] [146 × 4] 4 Adelie vs. Chinstrap [146 × 4] [68 × 4] 5 Chinstrap vs. Chinstrap [68 × 4] [68 × 4] 6 Gentoo vs. Chinstrap [119 × 4] [68 × 4] 7 Adelie vs. Gentoo [146 × 4] [119 × 4] 8 Chinstrap vs. Gentoo [68 × 4] [119 × 4] 9 Gentoo vs. Gentoo [119 × 4] [119 × 4]
例えばマハラノビス距離dat_rr_mahaD dat_rr %>%mutate(mahaD2 = map2(data.x, data.y, # yに対するxの距離~ mahalanobis(.x, colMeans(.y), cov(.y)))) %>%mutate(Var2 = str_c("vs. ", Var2))dat_mahaD dat_rr_mahaD %>%select(!data.y) %>%unnest(everything())
> dat_mahaD# A tibble: 999 × 7Var1 Var2 bill_length_mm bill_…¹ flipp…² body_…³ mahaD2 1 Adelie vs. Adelie -0.895 0.780 -1.42 -0.568 2.842 Adelie vs. Adelie -0.822 0.119 -1.07 -0.506 1.953 Adelie vs. Adelie -0.675 0.424 -0.426 -1.19 4.264 Adelie vs. Adelie -1.33 1.08 -0.568 -0.940 3.325 Adelie vs. Adelie -0.858 1.74 -0.782 -0.692 5.576 Adelie vs. Adelie -0.931 0.323 -1.42 -0.723 2.477 Adelie vs. Adelie -0.876 1.24 -0.426 0.581 5.948 Adelie vs. Adelie -0.529 0.221 -1.35 -1.25 5.279 Adelie vs. Adelie -0.986 2.05 -0.711 -0.506 7.7510 Adelie vs. Adelie -1.72 2.00 -0.212 0.240 15.2# … with 989 more rows, and abbreviated variable names# ¹bill_depth_mm, ²flipper_length_mm, ³body_mass_g# ℹ Use `print(n = ...)` to see more rows例えばマハラノビス距離
例えばマハラノビス距離ggplot(dat_mahaD) +aes(mahaD2, color = Var1, fill = Var1) +geom_density(alpha = 0.5) +facet_wrap(~Var2)
総当たり組み合わせRound-robinそう あ あくdevtools::install_github("kilometer0101/roundrobin")
Enjoy!