Statistics - We can do better!

Statistics We can do better

Story Time

! Programmers Need To Learn Statistics Or I Will Kill
Them All http://zedshaw.com/essays/programmer_stats.html

Averages

> summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max.
16.75 26.82 29.63 29.91 32.60 43.58 ! > summary(b) Min. 1st Qu. Median Mean 3rd Qu. Max. -15.48 16.90 31.07 30.12 43.42 80.86

> summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max.
16.75 26.82 29.63 29.91 32.60 43.58 ! > summary(b) Min. 1st Qu. Median Mean 3rd Qu. Max. -15.48 16.90 31.07 30.12 43.42 80.86 ! > sd(a) [1] 4.787892 ! > sd(b) [1] 20.87684

Power “The Power of Ten Syndrome” - Zed A. Shaw

! Definition: Power is the probability of rejecting the null
hypothesis OpenIntro Statistics p. 195 The chance that your experiment is right (Almost!)

> power.t.test ! Power calculations for one and two sample
t tests  > power.prop.test Power Calculations for Two-Sample Test for Proportions

! What’s a 2 sample T test? 2-Sample t-Test !
Use the 2-sample t-test to compare the averages between two groups and determine if there is a significant difference between them or if the observed difference is due instead to random chance.

> power.t.test Compute power of test, or determine parameters to
obtain target power. ! power.t.test(n = NULL, delta = NULL, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE) ! Exactly one of the parameters n, delta, power, sd, and sig.level must be passed as NULL, and that parameter is determined from the others. Notice that the last two have non-NULL defaults so NULL must be explicitly passed if you want to compute them.

! WHHAAAAAAAAAAAATTTTTT!!!!!!!!!

! WOW 3% chance I’m right? > power.t.test(n = 50,
delta = 0.01) ! Two-sample t test power calculation ! n = 50 delta = 0.01 sd = 1 sig.level = 0.05 power = 0.02803757 alternative = two.sided ! NOTE: n is number in *each* group

! n = Sample Size power.t.test(n = NULL, delta =
NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! delta = True difference in means power.t.test(n = NULL,
delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

power.t.test(n = NULL, delta = NULL, sd = 1, sig.level
= 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE) ! sd = Standard deviation

! sig.level = Significance level (Type I error probability) power.t.test(n
= NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! power = Power of test (1 minus Type II
error probability) power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! type = type of t test 2 sample, 1
sample or paired power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! alternative = One- or two-sided test power.t.test(n = NULL,
delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! strict = Use strict interpretation in two- sided case
power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c(“two.sample”, “one.sample”, “paired”), alternative = c(“two.sided","one.sided"), strict = FALSE)

! I DON’T KNOW WHAT ANY OF THAT MEANS

! Let’s try something > power.t.test(n = 50, delta =
0.01, sd = 0.01) ! Two-sample t test power calculation ! n = 50 delta = 0.01 sd = 0.01 sig.level = 0.05 power = 0.9986074 alternative = two.sided ! NOTE: n is number in *each* group

! 99.86% > power.t.test(n = 50, delta = 0.01, sd
= 0.01) ! Two-sample t test power calculation ! n = 50 delta = 0.01 sd = 0.01 sig.level = 0.05 power = 0.9986074 alternative = two.sided ! NOTE: n is number in *each* group

! NAILED IT !!!!

! What is the standard deviation Here?

! 70% not bad > power.t.test(n = 50, delta =
0.01, sd = 0.02) ! Two-sample t test power calculation ! n = 50 delta = 0.01 sd = 0.02 sig.level = 0.05 power = 0.6968888 alternative = two.sided ! NOTE: n is number in *each* group

! 105 samples were needed > power.t.test(power=0.95, delta = 0.01,
sd = 0.02) ! Two-sample t test power calculation ! n = 104.928 delta = 0.01 sd = 0.02 sig.level = 0.05 power = 0.95 alternative = two.sided ! NOTE: n is number in *each* group

! So what do we know?

! Sig.level

! Sig.level ! Type 1 error probability

! Power

! Power ! 1 minus the type 2 error probability

! What assumptions did we make?

! What was power.prop.test?

Confounding If you want to measure something, then don't measure
other shit. - Zed A. Shaw

Why does this matter?

! http://www.statisticsdonewrong.com

! Medical Trials and Power http://www.statisticsdonewrong.com/zbibliography.html#citation-tsang-2009iw

! “The result was significant with p<0.05, so there’s only
a 1 in 20 chance it’s a fluke!”, please beat them over the head with a statistics textbook for me. http://www.statisticsdonewrong.com/conclusion.html

! When to retreat.

Resources: http://www.openintro.org/stat/textbook.php http://www.statisticsdonewrong.com/ Statistics with R (Dalgaard) http://zedshaw.com/essays/programmer_stats.html ! !
!

Statistics - We can do better!

Statistics - We can do better!

Other Decks in Programming

Featured

Transcript