Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SappoRo.R_roundrobin
Search
kilometer
March 18, 2023
Programming
0
140
SappoRo.R_roundrobin
第10回Sapporo.Rで喋った際のスライドです。
kilometer
March 18, 2023
Tweet
Share
More Decks by kilometer
See All by kilometer
TokyoR#111_ANOVA
kilometer
2
870
TokyoR109.pdf
kilometer
1
460
TokyoR#108_NestedDataHandling
kilometer
0
800
TokyoR#107_R_GeoData
kilometer
0
420
TokyoR#104_DataProcessing
kilometer
1
690
TokyoR#103_DataProcessing
kilometer
0
880
TokyoR#102_RMarkdown
kilometer
1
640
TokyoR#101_RegressionAnalysis
kilometer
0
380
TokyoR#99_Divergence
kilometer
1
370
Other Decks in Programming
See All in Programming
Writing documentation can be fun with plugin system
okuramasafumi
0
120
SwiftUI Viewの責務分離
elmetal
PRO
2
250
法律の脱レガシーに学ぶフロントエンド刷新
oguemon
5
740
CI改善もDatadogとともに
taumu
0
180
第3回関東Kaggler会_AtCoderはKaggleの役に立つ
chettub
3
1.1k
クリーンアーキテクチャから見る依存の向きの大切さ
shimabox
4
880
Software Architecture
hschwentner
6
2.1k
PHPのバージョンアップ時にも役立ったAST
matsuo_atsushi
0
180
PHPカンファレンス名古屋2025 タスク分解の試行錯誤〜レビュー負荷を下げるために〜
soichi
1
590
負債になりにくいCSSをデザイナとつくるには?
fsubal
10
2.5k
color-scheme: light dark; を完全に理解する
uhyo
6
460
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
810
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
How to train your dragon (web standard)
notwaldorf
91
5.8k
The Pragmatic Product Professional
lauravandoore
32
6.4k
Writing Fast Ruby
sferik
628
61k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
10
1.3k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Designing for Performance
lara
604
68k
Music & Morning Musume
bryan
46
6.4k
How to Think Like a Performance Engineer
csswizardry
22
1.4k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Transcript
SappoRo.R #10 @kilometer00 2023.03.18 らくらく総当たり組み合わせ
Who!? Who?
Who!? 名前: 三村 @kilometer 職業: ポスドク (こうがくはくし) 専⾨: ⾏動神経科学(霊⻑類) 脳イメージング
医療システム⼯学 R歴: ~ 10年ぐらい 流⾏: アンキロサウルス
宣伝!!(書籍の翻訳に参加しました。) 絶賛販売中!
宣伝2!! R⾔語の地域コミュニティ@東京です。 定期的にR⾔語に関する勉強会を開催しています。 次回は4⽉22⽇!! 初⼼者特集回です!!
総当たり組み合わせ Round-robin そう あ あ く
dat_nest <- palmerpenguins::penguins %>% dplyr::group_nest(species) データを畳み込む > dat_nest # A
tibble: 3 × 2 species data <fct> <list<tibble[,7]>> 1 Adelie [152 × 7] 2 Chinstrap [68 × 7] 3 Gentoo [124 × 7] (息を吐くように)
# A tibble: 9 × 4 species.x species.y data.x data.y
<fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Adelie [152 × 7] [152 × 7] 2 Chinstrap Adelie [68 × 7] [152 × 7] 3 Gentoo Adelie [124 × 7] [152 × 7] 4 Adelie Chinstrap [152 × 7] [68 × 7] 5 Chinstrap Chinstrap [68 × 7] [68 × 7] 6 Gentoo Chinstrap [124 × 7] [68 × 7] 7 Adelie Gentoo [152 × 7] [124 × 7] 8 Chinstrap Gentoo [68 × 7] [124 × 7] 9 Gentoo Gentoo [124 × 7] [124 × 7] 総当たり組み合わせ # A tibble: 3 × 4 species.x species.y data.x data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Chinstrap [152 × 7] [68 × 7] 2 Adelie Gentoo [152 × 7] [124 × 7] 3 Chinstrap Gentoo [68 × 7] [124 × 7] 組み合わせ(combination) (round-robin)
base::expand.grid()関数 > dat_nest # A tibble: 3 × 2 species
data <fct> <list<tibble[,7]>> 1 Adelie [152 × 7] 2 Chinstrap [68 × 7] 3 Gentoo [124 × 7] dat_nest$species
grid <- dat_nest$species %>% expand.grid(., .) base::expand.grid()関数 > grid Var1
Var2 1 Adelie Adelie 2 Chinstrap Adelie 3 Gentoo Adelie 4 Adelie Chinstrap 5 Chinstrap Chinstrap 6 Gentoo Chinstrap 7 Adelie Gentoo 8 Chinstrap Gentoo 9 Gentoo Gentoo
dplyr::left_join()関数 > grid Var1 Var2 1 Adelie Adelie 2 Chinstrap
Adelie 3 Gentoo Adelie 4 Adelie Chinstrap 5 Chinstrap Chinstrap 6 Gentoo Chinstrap 7 Adelie Gentoo 8 Chinstrap Gentoo 9 Gentoo Gentoo > dat_nest # A tibble: 3 × 2 species data <fct> <list<tibble[,7]>> 1 Adelie [152 × 7] 2 Chinstrap [68 × 7] 3 Gentoo [124 × 7] ①対応づけて結合 ②対応づけて結合
dplyr::left_join()関数 dat_rr <- grid %>% tibble::as_tibble() %>% dplyr::left_join( dat_nest %>%
dplyr::rename(Var1 = "species"), by = "Var1" ) %>% dplyr::left_join( dat_nest %>% dplyr::rename(Var2 = "species"), by = "Var2" )
dplyr::left_join()関数 dat_rr <- grid %>% tibble::as_tibble() %>% dplyr::left_join( dat_nest %>%
dplyr::rename(Var1 = "species"), by = "Var1" ) %>% dplyr::left_join( dat_nest %>% dplyr::rename(Var2 = "species"), by = "Var2" ) ① ②
> dat_rr # A tibble: 9 × 4 Var1 Var2
data.x data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]> 1 Adelie Adelie [152 × 7] [152 × 7] 2 Chinstrap Adelie [68 × 7] [152 × 7] 3 Gentoo Adelie [124 × 7] [152 × 7] 4 Adelie Chinstrap [152 × 7] [68 × 7] 5 Chinstrap Chinstrap [68 × 7] [68 × 7] 6 Gentoo Chinstrap [124 × 7] [68 × 7] 7 Adelie Gentoo [152 × 7] [124 × 7] 8 Chinstrap Gentoo [68 × 7] [124 × 7] 9 Gentoo Gentoo [124 × 7] [124 × 7]
dplyr::rename()関数 dat_rr_rename <- dat_rr %>% rename(species.x = Var1) %>% rename(species.y
= Var2) > dat_rr_rename # A tibble: 9 × 4 species.x species.y data.x data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Adelie [152 × 7] [152 × 7] 2 Chinstrap Adelie [68 × 7] [152 × 7] 3 Gentoo Adelie [124 × 7] [152 × 7] 4 Adelie Chinstrap [152 × 7] [68 × 7] 5 Chinstrap Chinstrap [68 × 7] [68 × 7] 6 Gentoo Chinstrap [124 × 7] [68 × 7] 7 Adelie Gentoo [152 × 7] [124 × 7] 8 Chinstrap Gentoo [68 × 7] [124 × 7] 9 Gentoo Gentoo [124 × 7] [124 × 7]
dplyr::rename()関数 dat_rr_rename <- dat_rr %>% rename(species.x = Var1) %>% rename(species.y
= Var2) key <- "species" x <- stringr::str_c(key, ".x") y <- stringr::str_c(key, ".y") dat_rr_rename <- dat_rr %>% rename(!!x := Var1) %>% rename(!!y := Var2) 別解 {rlang}パッケージの演算⼦
# A tibble: 9 × 4 species.x species.y data.x data.y
<fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Adelie [152 × 7] [152 × 7] 2 Chinstrap Adelie [68 × 7] [152 × 7] 3 Gentoo Adelie [124 × 7] [152 × 7] 4 Adelie Chinstrap [152 × 7] [68 × 7] 5 Chinstrap Chinstrap [68 × 7] [68 × 7] 6 Gentoo Chinstrap [124 × 7] [68 × 7] 7 Adelie Gentoo [152 × 7] [124 × 7] 8 Chinstrap Gentoo [68 × 7] [124 × 7] 9 Gentoo Gentoo [124 × 7] [124 × 7] 総当たり組み合わせ # A tibble: 3 × 4 species.x species.y data.x data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Chinstrap [152 × 7] [68 × 7] 2 Adelie Gentoo [152 × 7] [124 × 7] 3 Chinstrap Gentoo [68 × 7] [124 × 7] 組み合わせ(combination) (round-robin)
grid <- dat_nest$species %>% expand.grid(., .) %>% subset(unclass(Var1) < unclass(Var2))
%>% tibble::as_tibble() base::subset()関数 > grid # A tibble: 3 × 2 Var1 Var2 <fct> <fct> 1 Adelie Chinstrap 2 Adelie Gentoo 3 Chinstrap Gentoo
という変換を パッケージにしました。 devtools::install_github( "kilometer0101/roundrobin" ) (4回⼿打ちしたら⾯倒臭くなったので)
roundrobin::roundrobin()関数 # A tibble: 9 × 4 species.x species.y data.x
data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Adelie [152 × 7] [152 × 7] 2 Chinstrap Adelie [68 × 7] [152 × 7] 3 Gentoo Adelie [124 × 7] [152 × 7] 4 Adelie Chinstrap [152 × 7] [68 × 7] 5 Chinstrap Chinstrap [68 × 7] [68 × 7] 6 Gentoo Chinstrap [124 × 7] [68 × 7] 7 Adelie Gentoo [152 × 7] [124 × 7] 8 Chinstrap Gentoo [68 × 7] [124 × 7] 9 Gentoo Gentoo [124 × 7] [124 × 7] library(roundrobin) palmerpenguins::penguins %>% roundrobin(key = "species")
library(roundrobin) palmerpenguins::penguins %>% roundrobin(key = "species", combination = TRUE) roundrobin::roundrobin()関数
# A tibble: 3 × 4 species.x species.y data.x data.y <fct> <fct> <list<tibble[,7]>> <list<tibble[,7]>> 1 Adelie Chinstrap [152 × 7] [68 × 7] 2 Adelie Gentoo [152 × 7] [124 × 7] 3 Chinstrap Gentoo [68 × 7] [124 × 7]
使ってみますか。
library(tidyverse) library(palmerpenguins) library(roundrobin) dat <- palmerpenguins::penguins %>% na.omit() %>% #
NA除去 mutate_at( vars(c(contains("mm"), contains("g"))), ~ (. - mean(.)) / sd(.) # 標準化 ) %>% select(species, contains("mm"), contains("g")) 前処理
> dat # A tibble: 333 × 5 species bill_length_mm
bill_depth_mm flipper_length_mm body_mass_g <fct> <dbl> <dbl> <dbl> <dbl> 1 Adelie -0.895 0.780 -1.42 -0.568 2 Adelie -0.822 0.119 -1.07 -0.506 3 Adelie -0.675 0.424 -0.426 -1.19 4 Adelie -1.33 1.08 -0.568 -0.940 5 Adelie -0.858 1.74 -0.782 -0.692 6 Adelie -0.931 0.323 -1.42 -0.723 7 Adelie -0.876 1.24 -0.426 0.581 8 Adelie -0.529 0.221 -1.35 -1.25 9 Adelie -0.986 2.05 -0.711 -0.506 10 Adelie -1.72 2.00 -0.212 0.240 # … with 323 more rows # i Use `print(n = ...)` to see more rows 前処理
dat_long <- dat %>% rowid_to_column("id") %>% pivot_longer( cols = !species,
names_to = "parameter", values_to = "value" ) %>% group_by(parameter) %>% ungroup() .y <- dat_long %>% ungroup() %>% group_by(species) %>% summarise( mean_id = mean(id), min_id = min(id) ) dat_long %>% ggplot() + aes(parameter, id) + geom_tile(aes(fill = value)) + geom_hline( yintercept = max(dat_long$id) ) + geom_hline(data = .y, aes(yintercept = min_id)) + scale_y_continuous( breaks = .y$mean_id, labels = .y$species, expand = c(0, 0)) + theme( axis.title = element_blank(), axis.text.x = element_text( angle = 30, hjust = 1 ) ) 可視化コード (ちょちょいのちょい)
可視化
> dat_rr # A tibble: 9 × 4 Var1 Var2
data.x data.y <fct> <fct> <list<tibble[,4]>> <list<tibble[,4]>> 1 Adelie Adelie [146 × 4] [146 × 4] 2 Chinstrap Adelie [68 × 4] [146 × 4] 3 Gentoo Adelie [119 × 4] [146 × 4] 4 Adelie Chinstrap [146 × 4] [68 × 4] 5 Chinstrap Chinstrap [68 × 4] [68 × 4] 6 Gentoo Chinstrap [119 × 4] [68 × 4] 7 Adelie Gentoo [146 × 4] [119 × 4] 8 Chinstrap Gentoo [68 × 4] [119 × 4] 9 Gentoo Gentoo [119 × 4] [119 × 4] 総当たり組み合わせ dat_rr <- dat %>% roundrobin(key = "species", rename = FALSE)
例えばマハラノビス距離 dat_rr_mahaD <- dat_rr %>% mutate(mahaD2 = map2( data.x, data.y,
# yに対するxの距離 ~ mahalanobis(.x, colMeans(.y), cov(.y)) )) %>% mutate(Var2 = str_c("vs. ", Var2)) > dat_rr_mahaD # A tibble: 9 × 5 Var1 Var2 data.x data.y mahaD2 <fct> <chr> <list<tibble[,4]>> <list<tibble[,4]>> <list> 1 Adelie vs. Adelie [146 × 4] [146 × 4] <dbl [146]> 2 Chinstrap vs. Adelie [68 × 4] [146 × 4] <dbl [68]> 3 Gentoo vs. Adelie [119 × 4] [146 × 4] <dbl [119]> 4 Adelie vs. Chinstrap [146 × 4] [68 × 4] <dbl [146]> 5 Chinstrap vs. Chinstrap [68 × 4] [68 × 4] <dbl [68]> 6 Gentoo vs. Chinstrap [119 × 4] [68 × 4] <dbl [119]> 7 Adelie vs. Gentoo [146 × 4] [119 × 4] <dbl [146]> 8 Chinstrap vs. Gentoo [68 × 4] [119 × 4] <dbl [68]> 9 Gentoo vs. Gentoo [119 × 4] [119 × 4] <dbl [119]>
例えばマハラノビス距離 dat_rr_mahaD <- dat_rr %>% mutate(mahaD2 = map2( data.x, data.y,
# yに対するxの距離 ~ mahalanobis(.x, colMeans(.y), cov(.y)) )) %>% mutate(Var2 = str_c("vs. ", Var2)) dat_mahaD <- dat_rr_mahaD %>% select(!data.y) %>% unnest(everything())
> dat_mahaD # A tibble: 999 × 7 Var1 Var2
bill_length_mm bill_…¹ flipp…² body_…³ mahaD2 <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Adelie vs. Adelie -0.895 0.780 -1.42 -0.568 2.84 2 Adelie vs. Adelie -0.822 0.119 -1.07 -0.506 1.95 3 Adelie vs. Adelie -0.675 0.424 -0.426 -1.19 4.26 4 Adelie vs. Adelie -1.33 1.08 -0.568 -0.940 3.32 5 Adelie vs. Adelie -0.858 1.74 -0.782 -0.692 5.57 6 Adelie vs. Adelie -0.931 0.323 -1.42 -0.723 2.47 7 Adelie vs. Adelie -0.876 1.24 -0.426 0.581 5.94 8 Adelie vs. Adelie -0.529 0.221 -1.35 -1.25 5.27 9 Adelie vs. Adelie -0.986 2.05 -0.711 -0.506 7.75 10 Adelie vs. Adelie -1.72 2.00 -0.212 0.240 15.2 # … with 989 more rows, and abbreviated variable names # ¹bill_depth_mm, ²flipper_length_mm, ³body_mass_g # ℹ Use `print(n = ...)` to see more rows 例えばマハラノビス距離
例えばマハラノビス距離 ggplot(dat_mahaD) + aes(mahaD2, color = Var1, fill = Var1)
+ geom_density(alpha = 0.5) + facet_wrap(~Var2)
総当たり組み合わせ Round-robin そう あ あ く devtools::install_github( "kilometer0101/roundrobin" )
Enjoy!