Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
組織とデータ分析/統計的仮説検定 / Organization, data analysis ...
Search
Kenji Saito
PRO
November 28, 2024
Technology
0
93
組織とデータ分析/統計的仮説検定 / Organization, data analysis and statistical hypothesis testing
早稲田大学大学院経営管理研究科「企業データ分析」2024 冬の第1-2回で使用したスライドです。
Kenji Saito
PRO
November 28, 2024
Tweet
Share
More Decks by Kenji Saito
See All by Kenji Saito
P 値と有意差/分散分析 / P-value, Significant Difference and Analysis of Variance
ks91
PRO
0
32
関連2群のt検定/独立2群のt検定 / Related 2-group t-test and independent 2-group t-test
ks91
PRO
0
51
A Guide to Paper Writing Support with Generative AI - A Joint Zemi
ks91
PRO
0
12
正規分布と簡単な統計理論/t分布と信頼区間 / Normal distribution, simple statistical theory, t-distribution and confidence intervals
ks91
PRO
0
43
じわじわ迫ってきている自動化社会 (その先にメタ・ネイチャー) / The Slowly Approaching Automated Society (and its beyond: Meta-Nature)
ks91
PRO
0
8
起こりうる誤った推論/平均・分散・標準偏差・自由度 / Possible false inferences, means, variances, standard deviations and degrees of freedom
ks91
PRO
0
59
LaTeX と Overleaf によるショートペーパー作成 / Short paper writing with LaTeX and Overleaf
ks91
PRO
0
23
R を用いた検定(補講) (1) — Welch 検定 / Tests using R (supplementary) (1) - Welch test
ks91
PRO
0
12
R を用いた検定(補講) (2) — カイ二乗検定 / Tests using R (supplementary) (2) - Chi-squared test
ks91
PRO
0
13
Other Decks in Technology
See All in Technology
AWS re:Invent 2024 recap
hkoketsu
0
770
クレカ・銀行連携機能における “状態”との向き合い方 / SmartBank Engineer LT Event
smartbank
3
120
事業貢献を考えるための技術改善の目標設計と改善実績 / Targeted design of technical improvements to consider business contribution and improvement performance
oomatomo
0
210
スケールし続ける事業とサービスを支える組織とアーキテクチャの生き残り戦略 / The survival strategy for Money Forward’s engineering.
moneyforward
0
180
AWS環境におけるランサムウェア攻撃対策の設計
nrinetcom
PRO
1
300
[Oracle TechNight#85] Oracle Autonomous Databaseを使ったAI活用入門
oracle4engineer
PRO
1
200
[トレノケ雲の会 mod.13] 3回目のre:Inventで気づいたこと -CloudOperationsを添えて-
shintaro_fukatsu
0
120
Web APIをなぜつくるのか
mikanichinose
0
1.3k
生成AIによるテスト設計支援プロセスの構築とプロセス内のボトルネック解消の取り組み / 20241220 Suguru Ishii
shift_evolve
0
120
ZOZOTOWN の推薦における KPI モニタリング/KPI monitoring for ZOZOTOWN recommendations
rayuron
1
470
いまからでも遅くないコンテナ座学
nomu
0
170
組織に自動テストを書く文化を根付かせる戦略(2024冬版) / Building Automated Test Culture 2024 Winter Edition
twada
PRO
23
6.7k
Featured
See All Featured
Being A Developer After 40
akosma
89
590k
Code Reviewing Like a Champion
maltzj
521
39k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.4k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
4 Signs Your Business is Dying
shpigford
182
21k
Designing Experiences People Love
moore
139
23k
Navigating Team Friction
lara
183
15k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.1k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
The Language of Interfaces
destraynor
155
24k
The Cult of Friendly URLs
andyhume
78
6.1k
Transcript
Corporate data analysis — generated by Stable Diffusion XL v1.0
2024 1-2 (WBS) 2024 1-2 — 2024-12-02 – p.1/36
https://speakerdeck.com/ks91/collections/corporate-data-analysis-2024-winter 2024 1-2 — 2024-12-02 – p.2/36
( ) ( ) ( ) CSO (Chief Science Officer)
1993 ( ) 2006 ( ) SFC 24 P2P (Peer-to-Peer) 2011 ( ) 2018 2019 VR 2021.9 & VR 2022.3 2023 AI VR&RPG 2023.5 “Don’t Be So Serious” 2023 2024 AI( ) 2024 “ALOHA FROM HAWAII” 2024 2024 AI( ) → ( ) 2024 1-2 — 2024-12-02 – p.3/36
Dropbox Dropbox ( ) 2024 1-2 — 2024-12-02 – p.4/36
(B A ) 1 ( ) 2 (Wilcoxon-Mann-Whitney ) 2024
1-2 — 2024-12-02 – p.5/36
R 2024 1-2 — 2024-12-02 – p.6/36
[ ] , (2022) R R ( ) R 2024
1-2 — 2024-12-02 – p.7/36
( ) 1 12 2 • 2 12 2 (B
A ) • 3 12 9 4 12 9 5 12 16 6 12 16 t 7 12 23 2 ( ) t 8 12 23 2 ( ) t 9 1 6 P 10 1 6 11 1 20 12 1 20 13 1 27 14 1 27 W-IOI 2024 1-2 — 2024-12-02 – p.8/36
( 20 25 ) 1 (20 ) • 2 R
( 55 ) • 3 (32 ) • 4 (14 ) • 5 ( Git) (22 ) • 6 ( ) (24 ) • 7 (1) (25 ) • 8 (2) (25 ) • 9 R ( ) (1) — Welch (17 ) • 10 R ( ) (2) — (21 ) • 11 R ( ) (1) — (15 ) • 12 R ( ) (2) — (19 ) • 13 GPT-4 (19 ) • 14 GPT-4 (29 ) • 15 ( ) LaTeX Overleaf (40 ) • 8 (12/16 ) / (2 ) OK / 2024 1-2 — 2024-12-02 – p.9/36
. . . . . . ( ) ( 20
×(14+1) ) 2024 1-2 — 2024-12-02 – p.10/36
(2 )(160 ) (10∼20 ) ( ) and/or 1 (80
) 1 Q & A & (30∼40 ) (30∼40 ) 2024 1-2 — 2024-12-02 – p.11/36
Moodle ( Q&A ) ( ) Discord ( ) ←
( ) 2024 1-2 — 2024-12-02 – p.12/36
( ) A4 2 2 (Overleaf ) L ATEX PDF
( ) 2024 1-2 — 2024-12-02 – p.13/36
+ + [ ] R , (2008) R 2024 1-2
— 2024-12-02 – p.14/36
2024 1-2 — 2024-12-02 – p.15/36
= ⇒ (1) (2) (3) = ⇒ ( ) (
(2)) = ⇒ ( ) ( ) AI (← ) 2024 1-2 — 2024-12-02 – p.16/36
(observation) (sample) (random variable) (probability distribution) (population) (simple random sampling)
( )( 2 t , , ) 2 ( , ) 2024 1-2 — 2024-12-02 – p.17/36
(B A ) 1 ( ) 2 (Wilcoxon-Mann-Whitney ) 2024
1-2 — 2024-12-02 – p.18/36
1 ( ) P(X = x) = n C x
· px · (1 − p)n−x E[X] = np (1) (null hypothesis) H0 (2) (test statistic) ( x ) (3) H0 (null distribution) (4) (rejection region) ( ; 5% 1%) · (significance level) (5) ( H0 ) 2024 1-2 — 2024-12-02 – p.19/36
B ( p.47) RStudio R n C x ‘choose(n,x)’ n
= 18, x = 0 . . . choose(18,0)×0.50 × 0.518 = choose(18,0)×0.518 ( ) ⇒ ( ) 3 : : : 2024 1-2 — 2024-12-02 – p.20/36
R ( B)(1/2) — R n <- 18 # p
<- 0.5 # <- c() # ( ) # x 0 for (x in 0:n) { # <- c( , choose(n,x)*p^x*(1-p)^(n-x)) } halfp <- 0 # ( 0 1) ( ) 2024 1-2 — 2024-12-02 – p.21/36
R ( B)(2/2) — R # x 0 ( )
for (x in 0:n) { # 0.025 if (halfp + [x+1] > 0.025) { break } halfp <- halfp + [x+1] # } # color <- rep(c("red"), x) # rep 2 color <- c(color, rep(c("black"), n + 1 - x*2), color) <- 0:n # x # plot (lwd ) plot( , , type="h", lwd=3, col=color) 2024 1-2 — 2024-12-02 – p.22/36
0 5 10 15 0.00 0.05 0.10 0.15 ேᩘ ☜⋡
2024 1-2 — 2024-12-02 – p.23/36
R > binom.test(14, n=18, p=0.5) p-value (P )( 9 )
0.05 ↑ 2024 1-2 — 2024-12-02 – p.24/36
2 (Wilcoxon-Mann-Whitney ) WMW ( ) A B A B
( ) (2) U (U ) · U = min(nAnB + 1 2 nA (nA + 1) − RA, nAnB + 1 2 nB (nB + 1) − RB ) (4) ((3) ) U0.05 (5) U U0.05 2024 1-2 — 2024-12-02 – p.25/36
D ( p.70) RStudio . . . 2024 1-2 —
2024-12-02 – p.26/36
R ( D)(1/2) — GPT ChatGPT (GPT-4) R ( )
1 ( ) ⇒ GPT-4 (1/2) # calculate_rank_sum <- function(sample1, sample2) { # combined_samples <- c(sample1, sample2) sample_group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2))) # ranks <- rank(combined_samples) 2024 1-2 — 2024-12-02 – p.27/36
R ( D)(2/2) — GPT ⇒ GPT-4 (2/2) # df
<- data.frame(value = combined_samples, group = sample_group, rank = ranks) # rank_sum_sample1 <- sum(df[df$group == "sample1", "rank"]) rank_sum_sample2 <- sum(df[df$group == "sample2", "rank"]) return(list(sample1_rank_sum = rank_sum_sample1, sample2_rank_sum = rank_sum_sample2)) } # sample1 <- c(3, 1, 4) sample2 <- c(2, 5, 6) # calculate_rank_sum(sample1, sample2) 2024 1-2 — 2024-12-02 – p.28/36
GPT . . . GPT-4 . . . ‘rank(. .
.)’ RStudio Help → Search R Help ⇒ GPT GPT 3 (1) (GPT ) (2) (GPT ) (3) 2024 1-2 — 2024-12-02 – p.29/36
R ( D)(1/2) — R <- c(4.6, 5.6, 3.2, 3.2,
3.7, 4.0, 5.0, 4.6) <- c(4.6, 4.9, 7.1, 6.0, 5.2, 3.9, 5.3, 5.8) # combined_samples <- c( , ) sample_group <- c(rep(" ", length( )), rep(" ", length( ))) # ranks <- rank(combined_samples) # df <- data.frame(value = combined_samples, group = sample_group, rank = ranks) # ra <- sum(df[df$group == " ", "rank"]) rb <- sum(df[df$group == " ", "rank"]) 2024 1-2 — 2024-12-02 – p.30/36
R ( D)(2/2) — R # U na <- length(
) nb <- length( ) U <- min(na*nb + na / 2 * (na + 1) - ra, na*nb + nb / 2 * (nb + 1) - rb) print(paste("U =", U)) # paste # sdf <- data.frame( , ) # boxplot(sdf, ylim=c(0, 8.0), ylab=" ( : )") U U0.05 2024 1-2 — 2024-12-02 – p.31/36
⫧‶ ⫧‶࡛ࡣ࡞࠸ 0 2 4 6 8 ᖺ (༢:ⓒ) 2024
1-2 — 2024-12-02 – p.32/36
R WMW > wilcox.test( , ) p-value (P )( 9
) 0.05 P ↑ 2024 1-2 — 2024-12-02 – p.33/36
2024 1-2 — 2024-12-02 – p.34/36
1. (1) (2) 2024 12 5 ( ) 23:59 JST
( ) Waseda Moodle (Q & A ) 2024 1-2 — 2024-12-02 – p.35/36
2024 1-2 — 2024-12-02 – p.36/36