Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type M errors in practice: A case study

Type M errors in practice: A case study

- When low-powered studies show significant effects, these will be overestimates.
- Significant effects from low-powered studies will not be replicable.
- Seven experiments show that effects reported in Levy & Keller, 2013 are not replicable.
- Relying only on statistical significance leads to overconfident expectations of replicability.
- We make several suggestions for improving current practices.

Shravan Vasishth

June 28, 2018
Tweet

Other Decks in Science

Transcript

  1. Type M error in practice: A case study Daniela Mertzen,

    MSc Linguistics Universität Potsdam Dr. Lena Jäger Computer Science Universität Potsdam Prof. Andrew Gelman Statistics Columbia University Shravan Vasishth Linguistics, Universität Potsdam, Germany
  2. 2 Type M error in practice: A case study 1.

    Power is quite low in reading research 2. Low power leads to exaggerated estimates 3. Published claims will not be replicable 4. We demonstrate this with real data
  3. 3 Type M error in practice: A case study Research

    area: Reading processes in cognitive psychology
  4. Low power leads to exaggerated estimates: Type M error (simulated

    data) 5 True effect 15 ms, SD 100, n=20, power=0.10 −100 −50 0 50 100 0 10 20 30 40 50 Sample id Estimates (msec) Gelman & Carlin, 2014
  5. 6 Jäger, Engelmann & Vasishth, 2017 Low power leads to

    exaggerated estimates: Type M error (published data)
  6. A puzzle: Most psychologists are aware of the replication crisis,

    but few think they are affected 7 Some frequent reactions: •“In our field, we always replicate our results.” •“My own sub-field doesn’t have problems.” •“We replicate, we just don’t publish the data.”
  7. 8 The first principle is that you must not fool

    yourself and you are the easiest person to fool. Feynman
  8. 10 Seven replication attempts of Levy & Keller, 2013, using

    eyetracking and self-paced reading. • 2x2 repeated measures factorial design Two main effects and one interaction • 28 subjects, 24 items, Latin square design • Reading time in milliseconds The original eye tracking (reading) experiments: We demonstrate Type M error in published data
  9. • Two self-paced reading studies, two eye tracking • Prospective

    power for Levy and Keller experiments: 14 Four replication attempts [Full details in paper: bit.ly/TypeMError]
  10. Hierarchical linear models in Stan 15 i=1,…,I subjects j=1,…,J items

    n data points log rt = Xβ ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε Xn×p = 1 −1 −1 +1 1 +1 +1 +1 ⋮ ⋮ ⋮ ⋮ βp×1 = β0 β1 β2 β3 Main Effect 1 Main Effect 2 Interaction
  11. Hierarchical linear models in Stan 16 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε Xn×p = 1 −1 −1 +1 1 +1 +1 +1 ⋮ ⋮ ⋮ ⋮ = Zu = Zw
  12. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε
  13. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu )
  14. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw )
  15. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ)
  16. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors:
  17. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors: β0 ∼ Normal(0,10)
  18. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors: β0 ∼ Normal(0,10) β1,2,3 ∼ Normal(0,1)
  19. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors: β0 ∼ Normal(0,10) β1,2,3 ∼ Normal(0,1) σ ∼ Normal+ (0,1)
  20. Hierarchical linear models in Stan 17 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors: β0 ∼ Normal(0,10) β1,2,3 ∼ Normal(0,1) σ ∼ Normal+ (0,1) ρ ∼ LKJ(ν = 2)
  21. Hierarchical linear models in Stan 19 log rt = Xβ

    ⏟ fixed effects + Zu bu ⏟ subjects random effects + Zw bw ⏟ items random effects + ε bu ∼ MVN4 (0, Σu ) bw ∼ MVN4 (0, Σw ) ε ∼ Normal(0,σ) Priors: β0 ∼ Normal(0,10) β1,2,3 ∼ Normal(0,1) σ ∼ Normal+ (0,1) ρ ∼ LKJ(ν = 2)
  22. Levy & Keller 2013 claimed an interaction across the two

    experiments but never checked it statistically 22
  23. •Expt 5 (SPR): 28 participants, 24 items •Expt 6 (ET):

    28 participants, 24 items •Expt 7 (ET): 100 participants, 24 items 23 Three replication attempts of the claimed interaction
  24. 26 Type M error in practice: A case study Concluding

    remarks 1. Expts with 268 subjects show not a single effect 2. The published effects are Type M errors 3. Many researchers still don’t understand this point
  25. 27 Type M error in practice: A case study Concluding

    remarks 1. Move focus away from significance 2. Focus instead on estimation 3. Run higher-precision studies 4. Pre-register experiments 5. Conduct direct replications