Making Quantitative Anthropology More Reliable, Responsible & Rewarding

Making Quantitative Anthropology More Reliable, Responsible & Rewarding

Talk presented on 5 Feb 2019 at MPI-ETH in Halle, Germany

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

February 05, 2019
Tweet

Transcript

  1. Making Quantitative Anthropology More Reliable, Responsible & Rewarding Richard McElreath

    Max Planck Institute for Evolutionary Anthropology Leipzig
  2. None
  3. Replication Failures When I describe priming studies to audiences, the

    reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true. “ ” Daniel Kahneman Source: Thinking, Fast and Slow (page 57)
  4. Replication Failures Figures 1a and 1b. Replication results organized by

    replication effect size, 1a for Cohen’s d estimates, 1b for partial eta-squared estimates. When available, the triangle indicates the effect size obtained in the original study (Elaboration Likelihood main effect estimate does not appear because it was extremely large, partial-eta square of .59). Large circles represent the aggregate effect size obtained across all participants. Error bars represent 99% noncentral confidence intervals around the effects. Small x’s represent the effect sizes obtained within each site. Stroop effect Metaphoric restructuring Availability heuristic Persistence Standardized treatment difference Power & perspective Weight embodiment Warmth perceptions original studies replications Many Labs 3: https://osf.io/ct89g/ Disbelief remains an option:
  5. Replication Failures Meyer et al. 2015. Disfluent Fonts Don’t Help

    People Solve Math Problems. Figure 2. The effect of disfluent font on Cognitive Reflection Test scores. Th each individual study’s sample size against its effect size. Error bars are bootstra DISFLUENT FONT AND ANALYTIC REASON Original study (N=40)
  6. Replication Failures Fig 1. Relative risk of showing benefit or

    harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting
  7. Kaplan & Irvin 2015 Likelihood of Null Effects [...] Increased

    over Time Replication Failures Fig 1. Relative risk of showing benefit or harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting
  8. Replication Failures The case against science is straight-forward: much of

    the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. “ ” Richard Horton 11 April 2015 issue of The Lancet
  9. “There are many more elemental ‘discoveries’ later shown to be

    false than there are entries in the present table.”
  10. None
  11. Quantitive Anthropology • Quantitative anthropologist: An anthropologist who also does

    quantitative work • Quantitative Anthropology not psychology/economics/ medicine • “Replication” makes less sense as target • Willingness to publish negative results • Still serious problems: • Deficit of clear analysis plans, poor data management, poor reproducibility, data ethics in conflict with data openness, typically poor training in quantitative methods, causal salad
  12. Principles: Conflict & cooperation Reliability: Professional norms Responsibility: Science in

    context Reward: Positive incentives for diverse stakeholders
  13. Reliability Replicable Reproducible Reasoned Reliability Open Science: Preserve the rights

    of others to reach their own conclusions about our scholarship.
  14. Reliability Replicable: New investigation, similar inference Result generalizes Reproducible: Same

    investigation, repeatable inference Result exists Reasoned: Any investigation, justified inference Result might be knowledge
  15. Reinhart & Rogoff “Growth in a Time of Debt” Reliability:

    Reproducibility
  16. Reinhart & Rogoff “Growth in a Time of Debt” Reliability:

    Reproducibility
  17. Reliability: Reproducibility • EARS: Evolutionary Anthropology Reproducibility Study • https://github.com/rianaminocher/ears/

    The EARS Team Bret Beheim Principal Investigator Riana Minocher Student Principal Investigator Claudia Bavero Research Assistant Silke Atmaca Research Coordinator Protocol preregistration: https://github.com/rianaminocher/ears/ Anne Büchner Student Assistant Kristina Kunze Student Assistant Leonie Ette Student Assistant Anne Hellmund Student Assistant
  18. Reliability: Reproducibility: EARS • 560 empirical, quantitative studies of social

    learning • github.com/babeheim/ears-AAA-2018 • Attempt to reproduce: Data? Materials? Results?
  19. Reliability: Reproducibility: EARS = Total Sample 560 citations Empirical Social

    Learning
  20. Reliability: Reproducibility: EARS = Total Sample 560 citations Materials Received

    60 (10%) Empirical Social Learning
  21. Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475

    (95%) Materials Received 60 (10%) Empirical Social Learning
  22. Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475

    (95%) Reply Received 309 (65%) Materials Received 60 (10%) Empirical Social Learning
  23. Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475

    (95%) Reply Received 309 (65%) Materials Received 60 (10%) + 87 (28%) 147 (26%) Empirical Social Learning
  24. Reliability: Reproducibility: EARS What can a reasonable researcher expect? •

    What could a reasonable scholar achieve?
  25. Reliability: Reproducibility: EARS • Minority of contacted researchers uncooperative there

    is no code, scripting or paper data still in existence...this does NOT mean that the results are in any way invalid. I do not agree with the focus of your project “ looking at studies done even just a few years ago (i.e. published 2015 or earlier) is completely counter-productive “ I don't see the point here really. “
  26. Reliability: Reproducibility: EARS • Majority of contacted researchers enthusiastic “My

    initial thought was to say that this has been far too long – this study was conducted almost 20 years ago. But I had a quick look and amazingly I managed to find the data. Your project will be a good opportunity for me to put my folders in order and to make this material available. “ this sounds like a very worthwhile endeavor that is much needed in evolutionary anthropology “ I'm taking this occasion as a starting point for making my scripts more comprehensive, thank you. “
  27. Reliability: Reproducibility: EARS • Notes on points of failure •

    Many easy places to establish better norms “My god, that was 17 years ago. I have no idea where that data is. such studies often uses expertise from several people, and make multiple intermediate versions of the datasets. This is often done without really knowing what will end up in a paper and what will not, or even what the paper will be about “ Exploration great, but can’t pretend it is hypothesis test — data-dependent- hypotheses
  28. Reliability: Reproducibility: EARS • Next phase: • Does it reproduce?

    • Variation in analytical approaches • Development of community standards What can a reasonable re
  29. Reliability: Reproducibility • Professionalizing quant anth • Documented methods •

    Documented codings • Justified, algorithmic analysis • Error-correcting workflow • Professional standards
  30. Reliability: Reproducibility • At MPI-EVA: • Core data reliability unit

    Data Tsarina: Dr Silke Atmaca • Audit-able conversion trails: Raw notes to “data” • Persistent meta-data: Materials, data types, research context Meta-data is meaning • Machine-neutral data storage formats • Data security & persistence
  31. https://bigthink.com/errors-we-live-by/ Reliability: Reasoned

  32. Reliability: Reasoned • Repeatable, reproducible results are not necessarily knowledge

    • Wrong model of world can make accurate predictions • But may fail at intervention • Example: Grandmothering
  33. Reliability: Reasoned • At MPI-EVA: Theory & analysis group Dr

    Anne Kandler Dr Laurel Fogarty Dr Cody Ross Dr Justin Yeh Han Tran Dr Daniel Redhead
  34. Responsibility • Responsibilities to Participants, Collaborators, Public • Data protection,

    access & preservation • Participant & collaborator protection • Example: Minor marriage • Flash point: Safest thing from researcher perspective not best from public perspective • Claim: Huge benefits from limited openness
  35. Rewarding • Incentives for researchers, communities, public • Fieldwork not

    sustainable without incentives • Researchers: Sharing data & credit costly • Communities have a stake • Public suspicious of self-regulated community • Regulatory capture
  36. Chocó, Pacific coast Columbia Case Studies

  37. Dr Jeremy Koster (left) Data collection Data processing Statistics O

    N LIN E SO O N
  38. 7 29 1 2 3 6 4 5 10 9

    12 13 15 14 27 35 31 34 33 32 36 37 39 40 38 28 26 25 24 21 16 19 23 20 18 17 22 8 11 30
  39. Goals & Sample • Goals • Estimate skill development •

    How variable? • Develop stats machinery • Sample • 40 sites • 1,821 individuals • 21,160 hunting trips • 23,747 harvests • uncountable headaches “Pooh?” said Piglet. “Yes, Piglet?” said Pooh. “27,417 parameters,” said Piglet. “Oh, bother,” said Pooh.
  40. Simple model, twice nested 0 20 40 60 80 0

    age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 1 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 2 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 3 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site n 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling within 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling between 0 20 40 60 80 0 age 'skill'
  41. Steps for Reliability • Pre-registered: Model developed before data seen,

    theoretical justification, shared in public, design for measurement problems • Samples: Error-checking, scripted coding, scripted assembly, persistent links to raw • Analysis: Developed in stages, tested on simulated data, checked for sensitivity • Documentation: Everything bound together in R package — this was more work than writing the paper
  42. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  43. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  44. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  45. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  46. 0 18 40 55 80 Global mean 0 40 80

    26 0 --KEY-- ids (obs) peak obs range 0 40 80 31 1 CRE 16 (127) 0 40 80 34 2 MYA 59 (464) 0 40 80 34 3 MYN 52 (359) 0 40 80 29 4 QUI 32 (189) 0 40 80 32 5 ECH 2 (6) 0 40 80 31 6 WAO 48 (373) 0 40 80 26 7 BAR 18 (233) 0 40 80 26 8 INU 15 (29) 0 40 80 18 9 MTS 69 (1441) 0 40 80 26 10 PIR 42 (274) 0 40 80 33 11 CLB 12 (45) 0 40 80 23 12 PME 23 (172) 0 40 80 31 13 TS1 29 (127) 0 40 80 31 14 TS2 37 (139) 0 40 80 37 15 TS3 168 (793) 0 40 80 37 16 ACH 147 (14364) 0 40 80 36 17 GB1 69 (488) 0 40 80 30 18 GB2 4 (37) 0 40 80 30 19 GB3 16 (73) 0 40 80 35 20 CN1 6 (80) 0 40 80 34 21 GB4 19 (114) 0 40 80 30 22 BK1 80 (249) 0 40 80 24 23 BK2 57 (114) 0 40 80 36 24 CN2 14 (78) 0 40 80 36 25 CN3 15 (67) 0 40 80 25 26 BFA 59 (433) 0 40 80 27 27 CN4 14 (287) 0 40 80 51 28 BIS 24 (231) 0 40 80 31 29 HEH 45 (45) 0 40 80 33 30 DLG 26 (76) 0 40 80 38 31 BTK 27 (268) 0 40 80 30 32 PN1 35 (119) 0 40 80 31 33 PN2 23 (125) 0 40 80 28 34 AGT 44 (211) 0 40 80 31 35 MRT 77 (758) 0 40 80 32 36 NUA 36 (140) 0 40 80 42 37 NIM 26 (180) 0 40 80 32 38 NEN 7 (7) 0 40 80 33 39 MAR 6 (28) 0 40 80 18 40 WOL 27 (410) Skill (mean)
  47. Momentum? Reliability: Professional norms Responsibility: Science in context Reward: Positive

    incentives for all stakeholders
  48. None