Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
組織とデータ分析/統計的仮説検定 / Organization, data analysis ...
Search
Kenji Saito
PRO
November 28, 2024
Technology
0
130
組織とデータ分析/統計的仮説検定 / Organization, data analysis and statistical hypothesis testing
早稲田大学大学院経営管理研究科「企業データ分析」2024 冬の第1-2回で使用したスライドです。
Kenji Saito
PRO
November 28, 2024
Tweet
Share
More Decks by Kenji Saito
See All by Kenji Saito
続・インクルーシブな社会へ / Continuing Towards an Inclusive Society
ks91
PRO
0
16
AGI (人工一般知能) と創る新しく奇妙な社会 / New and Stranger Society built with AGI
ks91
PRO
0
63
回帰分析/大規模言語モデルと統計 / Regression Analysis, Large Language Models and Statistics
ks91
PRO
0
68
多重比較/相関分析 / Multiple Comparison and Correlation Analysis
ks91
PRO
0
64
アカデミーキャンプ 2025冬「考えるのは奴らだ」 / Academy Camp 2025 Winter - Live and Let Think DAY 3
ks91
PRO
0
60
アカデミーキャンプ 2025冬「考えるのは奴らだ」 / Academy Camp 2025 Winter - Live and Let Think DAY 2
ks91
PRO
0
47
アカデミーキャンプ 2025冬「考えるのは奴らだ」 / Academy Camp 2025 Winter - Live and Let Think DAY 1
ks91
PRO
1
73
インクルーシブな社会へ / Toward an Inclusive Society
ks91
PRO
0
21
P 値と有意差/分散分析 / P-value, Significant Difference and Analysis of Variance
ks91
PRO
0
70
Other Decks in Technology
See All in Technology
ExaDB-XSで利用されているExadata Exascaleについて
oracle4engineer
PRO
3
170
日経のデータベース事業とElasticsearch
hinatades
PRO
0
200
Potential EM 制度を始めた理由、そして2年後にやめた理由 - EMConf JP 2025
hoyo
2
1.6k
OPENLOGI Company Profile for engineer
hr01
1
20k
Apache Iceberg Case Study in LY Corporation
lycorptech_jp
PRO
0
260
役員・マネージャー・著者・エンジニアそれぞれの立場から見たAWS認定資格
nrinetcom
PRO
1
3k
JEDAI Meetup! Databricks AI/BI概要
databricksjapan
0
300
「正しく」失敗できる チームの作り方 〜リアルな事例から紐解く失敗を恐れない組織とは〜 / A team that can fail correctly
i35_267
2
720
AWSではじめる Web APIテスト実践ガイド / A practical guide to testing Web APIs on AWS
yokawasa
4
200
php-conference-nagoya-2025
fuwasegu
0
140
ソフトウェアエンジニアと仕事するときに知っておいたほうが良いこと / Key points for working with software engineers
pinkumohikan
1
140
NFV基盤のOpenStack更新 ~9世代バージョンアップへの挑戦~
vtj
0
330
Featured
See All Featured
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.3k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
27
1.6k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Scaling GitHub
holman
459
140k
Rails Girls Zürich Keynote
gr2m
94
13k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
4
430
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
Designing Experiences People Love
moore
140
23k
Gamification - CAS2011
davidbonilla
80
5.1k
Making Projects Easy
brettharned
116
6k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
10
500
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.4k
Transcript
Corporate data analysis — generated by Stable Diffusion XL v1.0
2024 1-2 (WBS) 2024 1-2 — 2024-12-02 – p.1/36
https://speakerdeck.com/ks91/collections/corporate-data-analysis-2024-winter 2024 1-2 — 2024-12-02 – p.2/36
( ) ( ) ( ) CSO (Chief Science Officer)
1993 ( ) 2006 ( ) SFC 24 P2P (Peer-to-Peer) 2011 ( ) 2018 2019 VR 2021.9 & VR 2022.3 2023 AI VR&RPG 2023.5 “Don’t Be So Serious” 2023 2024 AI( ) 2024 “ALOHA FROM HAWAII” 2024 2024 AI( ) → ( ) 2024 1-2 — 2024-12-02 – p.3/36
Dropbox Dropbox ( ) 2024 1-2 — 2024-12-02 – p.4/36
(B A ) 1 ( ) 2 (Wilcoxon-Mann-Whitney ) 2024
1-2 — 2024-12-02 – p.5/36
R 2024 1-2 — 2024-12-02 – p.6/36
[ ] , (2022) R R ( ) R 2024
1-2 — 2024-12-02 – p.7/36
( ) 1 12 2 • 2 12 2 (B
A ) • 3 12 9 4 12 9 5 12 16 6 12 16 t 7 12 23 2 ( ) t 8 12 23 2 ( ) t 9 1 6 P 10 1 6 11 1 20 12 1 20 13 1 27 14 1 27 W-IOI 2024 1-2 — 2024-12-02 – p.8/36
( 20 25 ) 1 (20 ) • 2 R
( 55 ) • 3 (32 ) • 4 (14 ) • 5 ( Git) (22 ) • 6 ( ) (24 ) • 7 (1) (25 ) • 8 (2) (25 ) • 9 R ( ) (1) — Welch (17 ) • 10 R ( ) (2) — (21 ) • 11 R ( ) (1) — (15 ) • 12 R ( ) (2) — (19 ) • 13 GPT-4 (19 ) • 14 GPT-4 (29 ) • 15 ( ) LaTeX Overleaf (40 ) • 8 (12/16 ) / (2 ) OK / 2024 1-2 — 2024-12-02 – p.9/36
. . . . . . ( ) ( 20
×(14+1) ) 2024 1-2 — 2024-12-02 – p.10/36
(2 )(160 ) (10∼20 ) ( ) and/or 1 (80
) 1 Q & A & (30∼40 ) (30∼40 ) 2024 1-2 — 2024-12-02 – p.11/36
Moodle ( Q&A ) ( ) Discord ( ) ←
( ) 2024 1-2 — 2024-12-02 – p.12/36
( ) A4 2 2 (Overleaf ) L ATEX PDF
( ) 2024 1-2 — 2024-12-02 – p.13/36
+ + [ ] R , (2008) R 2024 1-2
— 2024-12-02 – p.14/36
2024 1-2 — 2024-12-02 – p.15/36
= ⇒ (1) (2) (3) = ⇒ ( ) (
(2)) = ⇒ ( ) ( ) AI (← ) 2024 1-2 — 2024-12-02 – p.16/36
(observation) (sample) (random variable) (probability distribution) (population) (simple random sampling)
( )( 2 t , , ) 2 ( , ) 2024 1-2 — 2024-12-02 – p.17/36
(B A ) 1 ( ) 2 (Wilcoxon-Mann-Whitney ) 2024
1-2 — 2024-12-02 – p.18/36
1 ( ) P(X = x) = n C x
· px · (1 − p)n−x E[X] = np (1) (null hypothesis) H0 (2) (test statistic) ( x ) (3) H0 (null distribution) (4) (rejection region) ( ; 5% 1%) · (significance level) (5) ( H0 ) 2024 1-2 — 2024-12-02 – p.19/36
B ( p.47) RStudio R n C x ‘choose(n,x)’ n
= 18, x = 0 . . . choose(18,0)×0.50 × 0.518 = choose(18,0)×0.518 ( ) ⇒ ( ) 3 : : : 2024 1-2 — 2024-12-02 – p.20/36
R ( B)(1/2) — R n <- 18 # p
<- 0.5 # <- c() # ( ) # x 0 for (x in 0:n) { # <- c( , choose(n,x)*p^x*(1-p)^(n-x)) } halfp <- 0 # ( 0 1) ( ) 2024 1-2 — 2024-12-02 – p.21/36
R ( B)(2/2) — R # x 0 ( )
for (x in 0:n) { # 0.025 if (halfp + [x+1] > 0.025) { break } halfp <- halfp + [x+1] # } # color <- rep(c("red"), x) # rep 2 color <- c(color, rep(c("black"), n + 1 - x*2), color) <- 0:n # x # plot (lwd ) plot( , , type="h", lwd=3, col=color) 2024 1-2 — 2024-12-02 – p.22/36
0 5 10 15 0.00 0.05 0.10 0.15 ேᩘ ☜⋡
2024 1-2 — 2024-12-02 – p.23/36
R > binom.test(14, n=18, p=0.5) p-value (P )( 9 )
0.05 ↑ 2024 1-2 — 2024-12-02 – p.24/36
2 (Wilcoxon-Mann-Whitney ) WMW ( ) A B A B
( ) (2) U (U ) · U = min(nAnB + 1 2 nA (nA + 1) − RA, nAnB + 1 2 nB (nB + 1) − RB ) (4) ((3) ) U0.05 (5) U U0.05 2024 1-2 — 2024-12-02 – p.25/36
D ( p.70) RStudio . . . 2024 1-2 —
2024-12-02 – p.26/36
R ( D)(1/2) — GPT ChatGPT (GPT-4) R ( )
1 ( ) ⇒ GPT-4 (1/2) # calculate_rank_sum <- function(sample1, sample2) { # combined_samples <- c(sample1, sample2) sample_group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2))) # ranks <- rank(combined_samples) 2024 1-2 — 2024-12-02 – p.27/36
R ( D)(2/2) — GPT ⇒ GPT-4 (2/2) # df
<- data.frame(value = combined_samples, group = sample_group, rank = ranks) # rank_sum_sample1 <- sum(df[df$group == "sample1", "rank"]) rank_sum_sample2 <- sum(df[df$group == "sample2", "rank"]) return(list(sample1_rank_sum = rank_sum_sample1, sample2_rank_sum = rank_sum_sample2)) } # sample1 <- c(3, 1, 4) sample2 <- c(2, 5, 6) # calculate_rank_sum(sample1, sample2) 2024 1-2 — 2024-12-02 – p.28/36
GPT . . . GPT-4 . . . ‘rank(. .
.)’ RStudio Help → Search R Help ⇒ GPT GPT 3 (1) (GPT ) (2) (GPT ) (3) 2024 1-2 — 2024-12-02 – p.29/36
R ( D)(1/2) — R <- c(4.6, 5.6, 3.2, 3.2,
3.7, 4.0, 5.0, 4.6) <- c(4.6, 4.9, 7.1, 6.0, 5.2, 3.9, 5.3, 5.8) # combined_samples <- c( , ) sample_group <- c(rep(" ", length( )), rep(" ", length( ))) # ranks <- rank(combined_samples) # df <- data.frame(value = combined_samples, group = sample_group, rank = ranks) # ra <- sum(df[df$group == " ", "rank"]) rb <- sum(df[df$group == " ", "rank"]) 2024 1-2 — 2024-12-02 – p.30/36
R ( D)(2/2) — R # U na <- length(
) nb <- length( ) U <- min(na*nb + na / 2 * (na + 1) - ra, na*nb + nb / 2 * (nb + 1) - rb) print(paste("U =", U)) # paste # sdf <- data.frame( , ) # boxplot(sdf, ylim=c(0, 8.0), ylab=" ( : )") U U0.05 2024 1-2 — 2024-12-02 – p.31/36
⫧‶ ⫧‶࡛ࡣ࡞࠸ 0 2 4 6 8 ᖺ (༢:ⓒ) 2024
1-2 — 2024-12-02 – p.32/36
R WMW > wilcox.test( , ) p-value (P )( 9
) 0.05 P ↑ 2024 1-2 — 2024-12-02 – p.33/36
2024 1-2 — 2024-12-02 – p.34/36
1. (1) (2) 2024 12 5 ( ) 23:59 JST
( ) Waseda Moodle (Q & A ) 2024 1-2 — 2024-12-02 – p.35/36
2024 1-2 — 2024-12-02 – p.36/36