Upgrade to Pro — share decks privately, control downloads, hide ads and more …

組織とデータ分析/統計的仮説検定 / Organization and Data Analysis, and Statistical Hypothesis Testing

Kenji Saito
November 30, 2023

組織とデータ分析/統計的仮説検定 / Organization and Data Analysis, and Statistical Hypothesis Testing

早稲田大学大学院経営管理研究科「企業データ分析」2023 冬の第1-2回で使用したスライドです。

Kenji Saito

November 30, 2023
Tweet

More Decks by Kenji Saito

Other Decks in Business

Transcript

  1. generated by Stable Diffusion XL v1.0
    2023
    1-2
    (WBS)
    2023 1-2 — 2023-11-30 – p.1/36

    View full-size slide

  2. https://speakerdeck.com/ks91/collections/corporate-data-analysis-2023-winter
    Discord . . .
    Discord
    2023 1-2 — 2023-11-30 – p.2/36

    View full-size slide

  3. ( )
    ( )
    ( )
    SFC ( )
    CSO (Chief Science Officer)
    1993 ( )
    2006 ( )
    SFC
    23 P2P (Peer-to-Peer)
    2011 ( )
    2018 2019
    VR 2021.9 & VR 2022.3
    2023 AI VR&RPG 2023.5 “Don’t Be So Serious”
    VOXEL 2023.7 DAZE 2023 In Maker Faire Tokyo 2023
    → ( )
    2023 1-2 — 2023-11-30 – p.3/36

    View full-size slide

  4. Dropbox
    Dropbox
    ( )
    2023 1-2 — 2023-11-30 – p.4/36

    View full-size slide

  5. (B A )
    1 ( )
    2 (Wilcoxon-Mann-Whitney )
    2023 1-2 — 2023-11-30 – p.5/36

    View full-size slide

  6. R
    2023 1-2 — 2023-11-30 – p.6/36

    View full-size slide

  7. [ ] , (2022)
    R
    R
    ( )
    R
    2023 1-2 — 2023-11-30 – p.7/36

    View full-size slide

  8. ( )
    1 11 30 •
    2 11 30 (B A ) •
    3 12 7
    4 12 7
    5 12 14
    6 12 14 t
    7 12 21 2 ( ) t
    8 12 21 2 ( ) t
    9 1 11 P
    10 1 11
    11 1 18
    12 1 18
    13 1 25
    14 1 25
    W-IOI
    2023 1-2 — 2023-11-30 – p.8/36

    View full-size slide

  9. ( 20 )
    1 •
    2 R •
    3 •
    4 •
    5
    6 ( )
    7 (1)
    8 (2)
    9 R ( ) (1)
    10 R ( ) (2)
    11 R ( ) (1)
    12 R ( ) (2)
    13 GPT-4
    14 GPT-4
    15 ( ) LaTeX Overleaf
    8 (12/14 ) / (2 ) OK
    /
    2023 1-2 — 2023-11-30 – p.9/36

    View full-size slide

  10. . . . . . . ( )
    ( 20 ×(14+1) )
    2023 1-2 — 2023-11-30 – p.10/36

    View full-size slide

  11. (2 )(160 )
    (10∼20 )
    ( )
    and/or
    1 (80 ) 1
    Q & A & (30∼40 )
    (30∼40 )
    2023 1-2 — 2023-11-30 – p.11/36

    View full-size slide

  12. Moodle ( Q&A ) ( )
    Discord ( )

    ( )
    2023 1-2 — 2023-11-30 – p.12/36

    View full-size slide

  13. ( )
    A4 2 2
    (Overleaf ) L
    ATEX PDF
    ( )
    2023 1-2 — 2023-11-30 – p.13/36

    View full-size slide

  14. + + [ ]
    R , (2008)
    R
    2023 1-2 — 2023-11-30 – p.14/36

    View full-size slide

  15. 2023 1-2 — 2023-11-30 – p.15/36

    View full-size slide

  16. =

    (1) (2) (3)
    =
    ⇒ ( ) ( (2))
    =
    ⇒ ( )
    ( )
    AI
    2023 1-2 — 2023-11-30 – p.16/36

    View full-size slide

  17. (observation) (sample)
    (random variable) (probability distribution)
    (population)
    (simple random sampling)
    ( )( 2 t , , )
    2 ( , )
    2023 1-2 — 2023-11-30 – p.17/36

    View full-size slide

  18. (B A )
    1 ( )
    2 (Wilcoxon-Mann-Whitney )
    2023 1-2 — 2023-11-30 – p.18/36

    View full-size slide

  19. 1 ( )
    P(X = x) =
    n
    C
    x
    · px · (1 − p)n−x E[X] = np
    (1) (null hypothesis) H0
    (2) (test statistic) ( x )
    (3) H0
    (null distribution)
    (4) (rejection region) ( ; 5% 1%)
    · (significance level)
    (5) ( H0
    )
    2023 1-2 — 2023-11-30 – p.19/36

    View full-size slide

  20. B ( p.47)
    RStudio
    R
    n
    C
    x
    ‘choose(n,x)’
    n = 18, x = 0 . . .
    choose(18,0)×0.50 × 0.518 = choose(18,0)×0.518
    ( )

    ( ) 3
    :
    :
    :
    2023 1-2 — 2023-11-30 – p.20/36

    View full-size slide

  21. R ( B)(1/2) — R
    n <- 18 #
    p <- 0.5 #
    <- c() # ( )
    # x 0
    for (x in 0:n) {
    #
    <- c( , choose(n,x)*p^x*(1-p)^(n-x))
    }
    halfp <- 0 #
    ( 0 1)
    ( )
    2023 1-2 — 2023-11-30 – p.21/36

    View full-size slide

  22. R ( B)(2/2) — R
    # x 0 ( )
    for (x in 0:n) {
    # 0.025
    if (halfp + [x+1] > 0.025) {
    break
    }
    halfp <- halfp + [x+1] #
    }
    #
    color <- rep(c("red"), x) # rep 2
    color <- c(color, rep(c("black"), n + 1 - x*2), color)
    <- 0:n # x
    # plot (lwd )
    plot( , , type="h", lwd=3, col=color)
    2023 1-2 — 2023-11-30 – p.22/36

    View full-size slide

  23. 0 5 10 15
    0.00 0.05 0.10 0.15
    ேᩘ
    ☜⋡
    2023 1-2 — 2023-11-30 – p.23/36

    View full-size slide

  24. R
    > binom.test(14, n=18, p=0.5)
    p-value (P )( 9 ) 0.05

    2023 1-2 — 2023-11-30 – p.24/36

    View full-size slide

  25. 2 (Wilcoxon-Mann-Whitney )
    WMW ( )
    A B
    A B ( )
    (2) U (U )
    · U = min(nAnB
    + 1
    2
    nA
    (nA
    + 1) − RA, nAnB
    + 1
    2
    nB
    (nB
    + 1) − RB
    )
    (4) ((3) ) U0.05
    (5) U U0.05
    2023 1-2 — 2023-11-30 – p.25/36

    View full-size slide

  26. D ( p.70)
    RStudio
    . . .
    2023 1-2 — 2023-11-30 – p.26/36

    View full-size slide

  27. R ( D)(1/2) — GPT
    ChatGPT (GPT-4)
    R ( )
    1
    ( )
    ⇒ GPT-4 (1/2)
    #
    calculate_rank_sum <- function(sample1, sample2) {
    #
    combined_samples <- c(sample1, sample2)
    sample_group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2)))
    #
    ranks <- rank(combined_samples)
    2023 1-2 — 2023-11-30 – p.27/36

    View full-size slide

  28. R ( D)(2/2) — GPT
    ⇒ GPT-4 (2/2)
    #
    df <- data.frame(value = combined_samples, group = sample_group, rank = ranks)
    #
    rank_sum_sample1 <- sum(df[df$group == "sample1", "rank"])
    rank_sum_sample2 <- sum(df[df$group == "sample2", "rank"])
    return(list(sample1_rank_sum = rank_sum_sample1, sample2_rank_sum = rank_sum_sample2))
    }
    #
    sample1 <- c(3, 1, 4)
    sample2 <- c(2, 5, 6)
    #
    calculate_rank_sum(sample1, sample2)
    2023 1-2 — 2023-11-30 – p.28/36

    View full-size slide

  29. GPT
    . . .
    GPT-4
    . . .
    ‘rank(. . .)’
    RStudio Help → Search R Help

    GPT
    GPT 3
    (1) (GPT )
    (2) (GPT )
    (3)
    2023 1-2 — 2023-11-30 – p.29/36

    View full-size slide

  30. R ( D)(1/2) — R
    <- c(4.6, 5.6, 3.2, 3.2, 3.7, 4.0, 5.0, 4.6)
    <- c(4.6, 4.9, 7.1, 6.0, 5.2, 3.9, 5.3, 5.8)
    #
    combined_samples <- c( , )
    sample_group <- c(rep(" ", length( )), rep(" ", length( )))
    #
    ranks <- rank(combined_samples)
    #
    df <- data.frame(value = combined_samples, group = sample_group, rank = ranks)
    #
    ra <- sum(df[df$group == " ", "rank"])
    rb <- sum(df[df$group == " ", "rank"])
    2023 1-2 — 2023-11-30 – p.30/36

    View full-size slide

  31. R ( D)(2/2) — R
    # U
    na <- length( )
    nb <- length( )
    U <- min(na*nb + na / 2 * (na + 1) - ra, na*nb + nb / 2 * (nb + 1) - rb)
    print(paste("U =", U)) # paste
    #
    sdf <- data.frame( , )
    #
    boxplot(sdf, ylim=c(0, 8.0), ylab=" ( : )")
    U U0.05
    2023 1-2 — 2023-11-30 – p.31/36

    View full-size slide

  32. ⫧‶ ⫧‶࡛ࡣ࡞࠸
    0 2 4 6 8
    ᖺ཰ (༢఩:ⓒ୓෇)
    2023 1-2 — 2023-11-30 – p.32/36

    View full-size slide

  33. R WMW
    > wilcox.test( , )
    p-value (P )( 9 ) 0.05
    P

    2023 1-2 — 2023-11-30 – p.33/36

    View full-size slide

  34. 2023 1-2 — 2023-11-30 – p.34/36

    View full-size slide

  35. 1.
    (1)
    (2)
    2023 12 3 ( ) 23:59 JST ( )
    Waseda Moodle (Q & A )
    2023 1-2 — 2023-11-30 – p.35/36

    View full-size slide

  36. 2023 1-2 — 2023-11-30 – p.36/36

    View full-size slide