Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Cardiff 12-5-2017
Search
Daniel Lakens
May 12, 2017
Science
1
65
Cardiff 12-5-2017
Invited Colloquium on Designing Efficient and Informative Studies
Daniel Lakens
May 12, 2017
Tweet
Share
Other Decks in Science
See All in Science
NASの容量不足のお悩み解決!災害対策も兼ねた「Wasabi Cloud NAS」はここがスゴイ
climbteam
1
330
データベース09: 実体関連モデル上の一貫性制約
trycycle
PRO
0
1.1k
凸最適化からDC最適化まで
santana_hammer
1
350
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
PRO
0
140
データベース14: B+木 & ハッシュ索引
trycycle
PRO
0
660
【論文紹介】Is CLIP ideal? No. Can we fix it?Yes! 第65回 コンピュータビジョン勉強会@関東
shun6211
5
2.3k
Lean4による汎化誤差評価の形式化
milano0017
1
430
Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション
siyoo
0
410
白金鉱業Vol.21【初学者向け発表枠】身近な例から学ぶ数理最適化の基礎 / Learning the Basics of Mathematical Optimization Through Everyday Examples
brainpadpr
1
600
KH Coderチュートリアル(スライド版)
koichih
1
58k
データベース04: SQL (1/3) 単純質問 & 集約演算
trycycle
PRO
0
1.1k
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
PRO
0
180
Featured
See All Featured
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
66
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
140
How to make the Groovebox
asonas
2
1.9k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Why Our Code Smells
bkeepers
PRO
340
58k
エンジニアに許された特別な時間の終わり
watany
106
230k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.1k
Rails Girls Zürich Keynote
gr2m
96
14k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.2k
Transcript
Designing Efficient and Informative Studies Daniel Lakens @Lakens Eindhoven University
of Technology
How do you determine the sample size for a new
study?
Small samples have large variation, more Type 2 errors, and
inaccurate estimates.
Schönbrodt & Perugini, 2013
None
Studies in psychology often have low power. Estimates average around
50%. Cohen, 1962; Fraley & Vazire, 2014
One reason for low power is that people use heuristics
to plan their sample size.
You need to justify the sample size of a study.
What goal do you want to achieve?
Goal according to JPSP:
Goal according to JESP:
Statistical power is the long-run probability of observing p <
α with N participants, assuming a specific effect size.
But 1) You never know the true effect size, and
the literature is biased, and 2) If you expect a true effect of 0, power is 0
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100% 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Power Sample Size per condition in a independent t-test d=0.3 d=0.4 d=0.5 d=0.6 d=0.7 d=0.8
My department requires sample size justification before funding a study.
One justification the IRB accepts is 90% power.
What is ‘evidence’?
Evidence is always relative. You want a higher likelihood of
p<0.05 when H1 is true than when H0 is true.
High power leads to informative studies only when we control
alpha levels.
What we have been doing wrong: Using previous studies as
an effect size estimate
Distribution of η² for a medium effect size
Distribution of η² for a medium effect size
A pilot study does not provide a meaningful effect size
estimate for planning subsequent studies. Leon, Davis, & Kraemer, 2011
Power analysis based on significant studies need to be based
on a truncated F distribution. Taylor & Muller, 1996
Note the large variability
You can also take into account variability (‘assurance’) – e.g.,
using safeguard power. Perugini, Gallucci, & Constantini, 2014
Effect sizes from the published literature are always smaller than
you expect, even when you take into account that effect sizes from the published literature are always smaller than you expect.
Plan for the change you would like to see in
the world. Ask yourself: What is your smallest effect size of interest?
Requires you to specify H1! That’s a good thing. What
does you theory predict, or what do you care about if H0 is false?
If we don’t, science becomes unfalsifiable. We can never ‘accept
the null’.
But ‘I’m not interested in the size of the effect
– the presence of any effect supports my theory!’ Really?
Detecting d = 0.001 requires 42 million people.
You make implicit choices about which effects are too small
to matter all the time.
None
If you expect a ‘medium’ effect size and plan for
80% power, d<0.35 will never be significant.
If nothing else, the maximum sample you are willing to
collect determines your SESOI.
Now you can also reject effects as large as, or
larger than, your SESOI, using an equivalence test.
None
R package (“TOSTER”) & Excel
My prediction: Publishing a paper that say ‘p > 0.05,
thus no effect’ will be difficult in 2019.
Extending your statistical tool kit with equivalence tests is an
easy way to improve your inferences. Lakens, 2017
However, when the true effect size is larger than the
SESOI, powering for it is inefficient (and possibly wasteful).
Social Sciences Replication Project
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100% 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Power Sample Size per condition in a independent t-test d=0.3 d=0.4 d=0.5 d=0.6 d=0.7 d=0.8
None
When effect sizes are uncertain (=always), a better approach is
sequential analyses.
Optional stopping: Collecting data until p < 0.05 inflates the
Type 1 error.
A user of NHST could always obtain a significant result
through optional stopping. Wagenmakers, 2007
None
Sequential analysis controls Type 1 error rates (e.g., Pocock correction).
Wald, 1945
None
Pocock Boundary Number of analyses p-value threshold 2 0.0294 3
0.0221 4 0.0182 5 0.0158
None
None
You also correct alpha levels for equivalence tests (and can
calculate power for equivalence).
If you pre-register anyway, you can use one-sided tests (more
logical & more efficient)
None
Use decision rules based on p-values or Bayes factors, but
check Frequentist properties. Schonbrodt, Wagenmakers, Zehetleitner, & Perugini, 2015
The SESOI for the Higgs boson was not based on
feasibility, but theory.
If you think the current reproducibility crisis was bad, wait
till the theory crisis in psychology starts.
Thanks! @lakens https://www.coursera.org/learn/statistical-inferences