Slide 1

Slide 1 text

Making Quantitative Anthropology More Reliable, Responsible & Rewarding Richard McElreath Max Planck Institute for Evolutionary Anthropology Leipzig

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Replication Failures When I describe priming studies to audiences, the reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true. “ ” Daniel Kahneman Source: Thinking, Fast and Slow (page 57)

Slide 4

Slide 4 text

Replication Failures Figures 1a and 1b. Replication results organized by replication effect size, 1a for Cohen’s d estimates, 1b for partial eta-squared estimates. When available, the triangle indicates the effect size obtained in the original study (Elaboration Likelihood main effect estimate does not appear because it was extremely large, partial-eta square of .59). Large circles represent the aggregate effect size obtained across all participants. Error bars represent 99% noncentral confidence intervals around the effects. Small x’s represent the effect sizes obtained within each site. Stroop effect Metaphoric restructuring Availability heuristic Persistence Standardized treatment difference Power & perspective Weight embodiment Warmth perceptions original studies replications Many Labs 3: https://osf.io/ct89g/ Disbelief remains an option:

Slide 5

Slide 5 text

Replication Failures Meyer et al. 2015. Disfluent Fonts Don’t Help People Solve Math Problems. Figure 2. The effect of disfluent font on Cognitive Reflection Test scores. Th each individual study’s sample size against its effect size. Error bars are bootstra DISFLUENT FONT AND ANALYTIC REASON Original study (N=40)

Slide 6

Slide 6 text

Replication Failures Fig 1. Relative risk of showing benefit or harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting

Slide 7

Slide 7 text

Kaplan & Irvin 2015 Likelihood of Null Effects [...] Increased over Time Replication Failures Fig 1. Relative risk of showing benefit or harm of treatment by year of publication for large NHLBI trials on pharmaceutical and dietary supplement interventions. Positive trials are indicated by the plus signs while trials showing harm are indicated by a diagonal line within a circle. Prior to 2000 when trials were not registered in clinical trials.gov, there was substantial variability in outcome. Following the imposition of the requirement that trials preregister in clinical trials.gov the relative risk on primary outcomes showed considerably less variability around 1.0. Null Trials and Transparent Reporting

Slide 8

Slide 8 text

Replication Failures The case against science is straight-forward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. “ ” Richard Horton 11 April 2015 issue of The Lancet

Slide 9

Slide 9 text

“There are many more elemental ‘discoveries’ later shown to be false than there are entries in the present table.”

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Quantitive Anthropology • Quantitative anthropologist: An anthropologist who also does quantitative work • Quantitative Anthropology not psychology/economics/ medicine • “Replication” makes less sense as target • Willingness to publish negative results • Still serious problems: • Deficit of clear analysis plans, poor data management, poor reproducibility, data ethics in conflict with data openness, typically poor training in quantitative methods, causal salad

Slide 12

Slide 12 text

Principles: Conflict & cooperation Reliability: Professional norms Responsibility: Science in context Reward: Positive incentives for diverse stakeholders

Slide 13

Slide 13 text

Reliability Replicable Reproducible Reasoned Reliability Open Science: Preserve the rights of others to reach their own conclusions about our scholarship.

Slide 14

Slide 14 text

Reliability Replicable: New investigation, similar inference Result generalizes Reproducible: Same investigation, repeatable inference Result exists Reasoned: Any investigation, justified inference Result might be knowledge

Slide 15

Slide 15 text

Reinhart & Rogoff “Growth in a Time of Debt” Reliability: Reproducibility

Slide 16

Slide 16 text

Reinhart & Rogoff “Growth in a Time of Debt” Reliability: Reproducibility

Slide 17

Slide 17 text

Reliability: Reproducibility • EARS: Evolutionary Anthropology Reproducibility Study • https://github.com/rianaminocher/ears/ The EARS Team Bret Beheim Principal Investigator Riana Minocher Student Principal Investigator Claudia Bavero Research Assistant Silke Atmaca Research Coordinator Protocol preregistration: https://github.com/rianaminocher/ears/ Anne Büchner Student Assistant Kristina Kunze Student Assistant Leonie Ette Student Assistant Anne Hellmund Student Assistant

Slide 18

Slide 18 text

Reliability: Reproducibility: EARS • 560 empirical, quantitative studies of social learning • github.com/babeheim/ears-AAA-2018 • Attempt to reproduce: Data? Materials? Results?

Slide 19

Slide 19 text

Reliability: Reproducibility: EARS = Total Sample 560 citations Empirical Social Learning

Slide 20

Slide 20 text

Reliability: Reproducibility: EARS = Total Sample 560 citations Materials Received 60 (10%) Empirical Social Learning

Slide 21

Slide 21 text

Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475 (95%) Materials Received 60 (10%) Empirical Social Learning

Slide 22

Slide 22 text

Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475 (95%) Reply Received 309 (65%) Materials Received 60 (10%) Empirical Social Learning

Slide 23

Slide 23 text

Reliability: Reproducibility: EARS = Total Sample 560 citations Contacted 475 (95%) Reply Received 309 (65%) Materials Received 60 (10%) + 87 (28%) 147 (26%) Empirical Social Learning

Slide 24

Slide 24 text

Reliability: Reproducibility: EARS What can a reasonable researcher expect? • What could a reasonable scholar achieve?

Slide 25

Slide 25 text

Reliability: Reproducibility: EARS • Minority of contacted researchers uncooperative there is no code, scripting or paper data still in existence...this does NOT mean that the results are in any way invalid. I do not agree with the focus of your project “ looking at studies done even just a few years ago (i.e. published 2015 or earlier) is completely counter-productive “ I don't see the point here really. “

Slide 26

Slide 26 text

Reliability: Reproducibility: EARS • Majority of contacted researchers enthusiastic “My initial thought was to say that this has been far too long – this study was conducted almost 20 years ago. But I had a quick look and amazingly I managed to find the data. Your project will be a good opportunity for me to put my folders in order and to make this material available. “ this sounds like a very worthwhile endeavor that is much needed in evolutionary anthropology “ I'm taking this occasion as a starting point for making my scripts more comprehensive, thank you. “

Slide 27

Slide 27 text

Reliability: Reproducibility: EARS • Notes on points of failure • Many easy places to establish better norms “My god, that was 17 years ago. I have no idea where that data is. such studies often uses expertise from several people, and make multiple intermediate versions of the datasets. This is often done without really knowing what will end up in a paper and what will not, or even what the paper will be about “ Exploration great, but can’t pretend it is hypothesis test — data-dependent- hypotheses

Slide 28

Slide 28 text

Reliability: Reproducibility: EARS • Next phase: • Does it reproduce? • Variation in analytical approaches • Development of community standards What can a reasonable re

Slide 29

Slide 29 text

Reliability: Reproducibility • Professionalizing quant anth • Documented methods • Documented codings • Justified, algorithmic analysis • Error-correcting workflow • Professional standards

Slide 30

Slide 30 text

Reliability: Reproducibility • At MPI-EVA: • Core data reliability unit Data Tsarina: Dr Silke Atmaca • Audit-able conversion trails: Raw notes to “data” • Persistent meta-data: Materials, data types, research context Meta-data is meaning • Machine-neutral data storage formats • Data security & persistence

Slide 31

Slide 31 text

https://bigthink.com/errors-we-live-by/ Reliability: Reasoned

Slide 32

Slide 32 text

Reliability: Reasoned • Repeatable, reproducible results are not necessarily knowledge • Wrong model of world can make accurate predictions • But may fail at intervention • Example: Grandmothering

Slide 33

Slide 33 text

Reliability: Reasoned • At MPI-EVA: Theory & analysis group Dr Anne Kandler Dr Laurel Fogarty Dr Cody Ross Dr Justin Yeh Han Tran Dr Daniel Redhead

Slide 34

Slide 34 text

Responsibility • Responsibilities to Participants, Collaborators, Public • Data protection, access & preservation • Participant & collaborator protection • Example: Minor marriage • Flash point: Safest thing from researcher perspective not best from public perspective • Claim: Huge benefits from limited openness

Slide 35

Slide 35 text

Rewarding • Incentives for researchers, communities, public • Fieldwork not sustainable without incentives • Researchers: Sharing data & credit costly • Communities have a stake • Public suspicious of self-regulated community • Regulatory capture

Slide 36

Slide 36 text

Chocó, Pacific coast Columbia Case Studies

Slide 37

Slide 37 text

Dr Jeremy Koster (left) Data collection Data processing Statistics O N LIN E SO O N

Slide 38

Slide 38 text

7 29 1 2 3 6 4 5 10 9 12 13 15 14 27 35 31 34 33 32 36 37 39 40 38 28 26 25 24 21 16 19 23 20 18 17 22 8 11 30

Slide 39

Slide 39 text

Goals & Sample • Goals • Estimate skill development • How variable? • Develop stats machinery • Sample • 40 sites • 1,821 individuals • 21,160 hunting trips • 23,747 harvests • uncountable headaches “Pooh?” said Piglet. “Yes, Piglet?” said Pooh. “27,417 parameters,” said Piglet. “Oh, bother,” said Pooh.

Slide 40

Slide 40 text

Simple model, twice nested 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 1 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 2 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site 3 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Site n 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling within 0 20 40 60 80 0 age 'skill' 0 20 40 60 80 0 age 'skill' Pooling between 0 20 40 60 80 0 age 'skill'

Slide 41

Slide 41 text

Steps for Reliability • Pre-registered: Model developed before data seen, theoretical justification, shared in public, design for measurement problems • Samples: Error-checking, scripted coding, scripted assembly, persistent links to raw • Analysis: Developed in stages, tested on simulated data, checked for sensitivity • Documentation: Everything bound together in R package — this was more work than writing the paper

Slide 42

Slide 42 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 43

Slide 43 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 44

Slide 44 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 45

Slide 45 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 46

Slide 46 text

0 18 40 55 80 Global mean 0 40 80 26 0 --KEY-- ids (obs) peak obs range 0 40 80 31 1 CRE 16 (127) 0 40 80 34 2 MYA 59 (464) 0 40 80 34 3 MYN 52 (359) 0 40 80 29 4 QUI 32 (189) 0 40 80 32 5 ECH 2 (6) 0 40 80 31 6 WAO 48 (373) 0 40 80 26 7 BAR 18 (233) 0 40 80 26 8 INU 15 (29) 0 40 80 18 9 MTS 69 (1441) 0 40 80 26 10 PIR 42 (274) 0 40 80 33 11 CLB 12 (45) 0 40 80 23 12 PME 23 (172) 0 40 80 31 13 TS1 29 (127) 0 40 80 31 14 TS2 37 (139) 0 40 80 37 15 TS3 168 (793) 0 40 80 37 16 ACH 147 (14364) 0 40 80 36 17 GB1 69 (488) 0 40 80 30 18 GB2 4 (37) 0 40 80 30 19 GB3 16 (73) 0 40 80 35 20 CN1 6 (80) 0 40 80 34 21 GB4 19 (114) 0 40 80 30 22 BK1 80 (249) 0 40 80 24 23 BK2 57 (114) 0 40 80 36 24 CN2 14 (78) 0 40 80 36 25 CN3 15 (67) 0 40 80 25 26 BFA 59 (433) 0 40 80 27 27 CN4 14 (287) 0 40 80 51 28 BIS 24 (231) 0 40 80 31 29 HEH 45 (45) 0 40 80 33 30 DLG 26 (76) 0 40 80 38 31 BTK 27 (268) 0 40 80 30 32 PN1 35 (119) 0 40 80 31 33 PN2 23 (125) 0 40 80 28 34 AGT 44 (211) 0 40 80 31 35 MRT 77 (758) 0 40 80 32 36 NUA 36 (140) 0 40 80 42 37 NIM 26 (180) 0 40 80 32 38 NEN 7 (7) 0 40 80 33 39 MAR 6 (28) 0 40 80 18 40 WOL 27 (410) Skill (mean)

Slide 47

Slide 47 text

Momentum? Reliability: Professional norms Responsibility: Science in context Reward: Positive incentives for all stakeholders

Slide 48

Slide 48 text

No content