Making Quantitative Anthropology More Reliable, Responsible & Rewarding

Making Quantitative Anthropology More Reliable, Responsible & Rewarding Richard McElreath
Max Planck Institute for Evolutionary Anthropology Leipzig

Replication Failures When I describe priming studies to audiences, the
reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true. “ ” Daniel Kahneman Source: Thinking, Fast and Slow (page 57)

Replication Failures Figures 1a and 1b. Replication results organized by
replication effect size, 1a for Cohen’s d estimates, 1b for partial eta-squared estimates. When available, the triangle indicates the effect size obtained in the original study (Elaboration Likelihood main effect estimate does not appear because it was extremely large, partial-eta square of .59). Large circles represent the aggregate effect size obtained across all participants. Error bars represent 99% noncentral confidence intervals around the effects. Small x’s represent the effect sizes obtained within each site. Stroop eﬀect Metaphoric restructuring Availability heuristic Persistence Standardized treatment diﬀerence Power & perspective Weight embodiment Warmth perceptions original studies replications Many Labs 3: https://osf.io/ct89g/ Disbelief remains an option:

Replication Failures Meyer et al. 2015. Disfluent Fonts Don’t Help
People Solve Math Problems. Figure 2. The effect of disfluent font on Cognitive Reflection Test scores. Th each individual study’s sample size against its effect size. Error bars are bootstra DISFLUENT FONT AND ANALYTIC REASON Original study (N=40)

Replication Failures Fig 1. Relative risk of showing benefit or
harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting

Kaplan & Irvin 2015 Likelihood of Null Eﬀects [...] Increased
over Time Replication Failures Fig 1. Relative risk of showing benefit or harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting

Replication Failures The case against science is straight-forward: much of
the scientific literature, perhaps half, may simply be untrue. Aﬄicted by studies with small sample sizes, tiny eﬀects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. “ ” Richard Horton 11 April 2015 issue of The Lancet

“There are many more elemental ‘discoveries’ later shown to be
false than there are entries in the present table.”

Quantitive Anthropology • Quantitative anthropologist: An anthropologist who also does
quantitative work • Quantitative Anthropology not psychology/economics/ medicine • “Replication” makes less sense as target • Willingness to publish negative results • Still serious problems: • Deficit of clear analysis plans, poor data management, poor reproducibility, data ethics in conflict with data openness, typically poor training in quantitative methods, causal salad

Principles: Conflict & cooperation Reliability: Professional norms Responsibility: Science in
context Reward: Positive incentives for diverse stakeholders

Reliability Replicable Reproducible Reasoned Reliability Open Science: Preserve the rights
of others to reach their own conclusions about our scholarship.

Reliability Replicable: New investigation, similar inference Result generalizes Reproducible: Same
investigation, repeatable inference Result exists Reasoned: Any investigation, justified inference Result might be knowledge

Reinhart & Rogoﬀ “Growth in a Time of Debt” Reliability:
Reproducibility

Reliability: Reproducibility • EARS: Evolutionary Anthropology Reproducibility Study • https://github.com/rianaminocher/ears/
The EARS Team Bret Beheim Principal Investigator Riana Minocher Student Principal Investigator Claudia Bavero Research Assistant Silke Atmaca Research Coordinator Protocol preregistration: https://github.com/rianaminocher/ears/ Anne Büchner Student Assistant Kristina Kunze Student Assistant Leonie Ette Student Assistant Anne Hellmund Student Assistant

Reliability: Reproducibility: EARS • 560 empirical, quantitative studies of social
learning • github.com/babeheim/ears-AAA-2018 • Attempt to reproduce: Data? Materials? Results?

Reliability: Reproducibility: EARS = Total Sample 560 citations Empirical Social
Learning

Reliability: Reproducibility: EARS = Total Sample 560 citations Materials Received
60 (10%) Empirical Social Learning

Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475
(95%) Materials Received 60 (10%) Empirical Social Learning

(95%) Reply Received 309 (65%) Materials Received 60 (10%) Empirical Social Learning

(95%) Reply Received 309 (65%) Materials Received 60 (10%) + 87 (28%) 147 (26%) Empirical Social Learning

Reliability: Reproducibility: EARS What can a reasonable researcher expect? •
What could a reasonable scholar achieve?

Reliability: Reproducibility: EARS • Minority of contacted researchers uncooperative there
is no code, scripting or paper data still in existence...this does NOT mean that the results are in any way invalid. I do not agree with the focus of your project “ looking at studies done even just a few years ago (i.e. published 2015 or earlier) is completely counter-productive “ I don't see the point here really. “

Reliability: Reproducibility: EARS • Majority of contacted researchers enthusiastic “My
initial thought was to say that this has been far too long – this study was conducted almost 20 years ago. But I had a quick look and amazingly I managed to ﬁnd the data. Your project will be a good opportunity for me to put my folders in order and to make this material available. “ this sounds like a very worthwhile endeavor that is much needed in evolutionary anthropology “ I'm taking this occasion as a starting point for making my scripts more comprehensive, thank you. “

Reliability: Reproducibility: EARS • Notes on points of failure •
Many easy places to establish better norms “My god, that was 17 years ago. I have no idea where that data is. such studies often uses expertise from several people, and make multiple intermediate versions of the datasets. This is often done without really knowing what will end up in a paper and what will not, or even what the paper will be about “ Exploration great, but can’t pretend it is hypothesis test — data-dependent- hypotheses

Reliability: Reproducibility: EARS • Next phase: • Does it reproduce?
• Variation in analytical approaches • Development of community standards What can a reasonable re

Reliability: Reproducibility • Professionalizing quant anth • Documented methods •
Documented codings • Justified, algorithmic analysis • Error-correcting workflow • Professional standards

Reliability: Reproducibility • At MPI-EVA: • Core data reliability unit
Data Tsarina: Dr Silke Atmaca • Audit-able conversion trails: Raw notes to “data” • Persistent meta-data: Materials, data types, research context Meta-data is meaning • Machine-neutral data storage formats • Data security & persistence

https://bigthink.com/errors-we-live-by/ Reliability: Reasoned

Reliability: Reasoned • Repeatable, reproducible results are not necessarily knowledge
• Wrong model of world can make accurate predictions • But may fail at intervention • Example: Grandmothering

Reliability: Reasoned • At MPI-EVA: Theory & analysis group Dr
Anne Kandler Dr Laurel Fogarty Dr Cody Ross Dr Justin Yeh Han Tran Dr Daniel Redhead

Responsibility • Responsibilities to Participants, Collaborators, Public • Data protection,
access & preservation • Participant & collaborator protection • Example: Minor marriage • Flash point: Safest thing from researcher perspective not best from public perspective • Claim: Huge benefits from limited openness

Rewarding • Incentives for researchers, communities, public • Fieldwork not
sustainable without incentives • Researchers: Sharing data & credit costly • Communities have a stake • Public suspicious of self-regulated community • Regulatory capture

Chocó, Pacific coast Columbia Case Studies

Dr Jeremy Koster (left) Data collection Data processing Statistics O
N LIN E SO O N

7 29 1 2 3 6 4 5 10 9
12 13 15 14 27 35 31 34 33 32 36 37 39 40 38 28 26 25 24 21 16 19 23 20 18 17 22 8 11 30

Goals & Sample • Goals • Estimate skill development •
How variable? • Develop stats machinery • Sample • 40 sites • 1,821 individuals • 21,160 hunting trips • 23,747 harvests • uncountable headaches “Pooh?” said Piglet. “Yes, Piglet?” said Pooh. “27,417 parameters,” said Piglet. “Oh, bother,” said Pooh.

Simple model, twice nested 0 20 40 60 80 0
age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 1 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 2 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 3 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site n 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling within 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling between 0 20 40 60 80 0 age 'skill'

Steps for Reliability • Pre-registered: Model developed before data seen,
theoretical justification, shared in public, design for measurement problems • Samples: Error-checking, scripted coding, scripted assembly, persistent links to raw • Analysis: Developed in stages, tested on simulated data, checked for sensitivity • Documentation: Everything bound together in R package — this was more work than writing the paper

0 40 80 37 16 ACH - skill 147 (14364)
0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

0 18 40 55 80 Global mean 0 40 80
26 0 --KEY-- ids (obs) peak obs range 0 40 80 31 1 CRE 16 (127) 0 40 80 34 2 MYA 59 (464) 0 40 80 34 3 MYN 52 (359) 0 40 80 29 4 QUI 32 (189) 0 40 80 32 5 ECH 2 (6) 0 40 80 31 6 WAO 48 (373) 0 40 80 26 7 BAR 18 (233) 0 40 80 26 8 INU 15 (29) 0 40 80 18 9 MTS 69 (1441) 0 40 80 26 10 PIR 42 (274) 0 40 80 33 11 CLB 12 (45) 0 40 80 23 12 PME 23 (172) 0 40 80 31 13 TS1 29 (127) 0 40 80 31 14 TS2 37 (139) 0 40 80 37 15 TS3 168 (793) 0 40 80 37 16 ACH 147 (14364) 0 40 80 36 17 GB1 69 (488) 0 40 80 30 18 GB2 4 (37) 0 40 80 30 19 GB3 16 (73) 0 40 80 35 20 CN1 6 (80) 0 40 80 34 21 GB4 19 (114) 0 40 80 30 22 BK1 80 (249) 0 40 80 24 23 BK2 57 (114) 0 40 80 36 24 CN2 14 (78) 0 40 80 36 25 CN3 15 (67) 0 40 80 25 26 BFA 59 (433) 0 40 80 27 27 CN4 14 (287) 0 40 80 51 28 BIS 24 (231) 0 40 80 31 29 HEH 45 (45) 0 40 80 33 30 DLG 26 (76) 0 40 80 38 31 BTK 27 (268) 0 40 80 30 32 PN1 35 (119) 0 40 80 31 33 PN2 23 (125) 0 40 80 28 34 AGT 44 (211) 0 40 80 31 35 MRT 77 (758) 0 40 80 32 36 NUA 36 (140) 0 40 80 42 37 NIM 26 (180) 0 40 80 32 38 NEN 7 (7) 0 40 80 33 39 MAR 6 (28) 0 40 80 18 40 WOL 27 (410) Skill (mean)

Momentum? Reliability: Professional norms Responsibility: Science in context Reward: Positive
incentives for all stakeholders

Making Quantitative Anthropology More Reliable,...

Making Quantitative Anthropology More Reliable, Responsible & Rewarding

More Decks by Richard McElreath

Other Decks in Education

Featured

Transcript