Slide 1

Slide 1 text

Bad Data, Big Models & Statistical Methods for Studying Evolution Richard McElreath Max Planck Institute for Evolutionary Anthropology Leipzig

Slide 2

Slide 2 text

Elephant embryo, 16 weeks

Slide 3

Slide 3 text

Sturnus vulgaris

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Homo sapiens

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Darwin 1837

Slide 8

Slide 8 text

(a) (b) Fig. 5. Depiction of the regression phenomenon, based on a figure from Galton (1889 (a) the expected landing position of pellets from a specific upper level compartment a point of origin of pellets landing in a specific lower level compartment There was a hint in the lecture that Galton was moving towards a mechanism his thinking. He seemed to approach Mendelian genetics when he speculated Francis Galton builds Bayesian conditional distributions (1873, 1889). See Stigler (2010): Darwin, Galton and the Statistical Enlightenment

Slide 9

Slide 9 text

Rothamsted Experimental Station

Slide 10

Slide 10 text

Photo: Dr Martin Muller

Slide 11

Slide 11 text

Statistical Inference in Evolution & Ecology • Fundamentally observational • Data: Bad, scarce & derived • Null models not unique • Plausible models custom & hungry • Result: Hi-D headaches, even with small data • Need: Flexible, scalable tools

Slide 12

Slide 12 text

Outline • The evolution of statistical methods for studying evolution • Bad data in a big Stan model

Slide 13

Slide 13 text

James Crow, R.A.Fisher, Motoo Kimura in 1961

Slide 14

Slide 14 text

Kimura, M. 1983. The Neutral Theory of Molecular Evolution Neutral theory Data Uniform mutation Varying mutation

Slide 15

Slide 15 text

X → Y,Z Hypotheses Process models Statistical models

Slide 16

Slide 16 text

H0 H1 “Evolution is neutral” “Selection matters” Hypotheses Process models Statistical models

Slide 17

Slide 17 text

H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, non-equilibrium P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection Hypotheses Process models Statistical models

Slide 18

Slide 18 text

H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, non-equilibrium P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MII MI MIII Hypotheses Process models Statistical models

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

nucleotide polymorphisms (SNPs) shared with Neandertals will thus reflect, at least in part, the time since Neandertals or their ancestors and modern humans or their ancestors last exchanged genes with each other. genetic distance x (expected number of crossover reco events per meiosis) apart, arose on the Neandertal lin introgressed into modern humans at time tGF , the proba these alleles have not been broken up by recombination The Date of Interbreeding between Neandertals and Modern Humans Sriram Sankararaman1,2*, Nick Patterson2, Heng Li2, Svante Pa ¨a ¨bo3*, David Reich1,2* 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 3 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany Abstract Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa. Citation: Sankararaman S, Patterson N, Li H, Pa ¨a ¨bo S, Reich D (2012) The Date of Interbreeding between Neandertals and Modern Humans. PLoS Genet 8(10): e1002947. doi:10.1371/journal.pgen.1002947 Editor: Joshua M. Akey, University of Washington, United States of America Received December 15, 2011; Accepted July 27, 2012; Published October 4, 2012 Copyright: ß 2012 Sankararaman et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Presidential Innovation Fund of the Max Planck Society, the Krekeler Foundation, and the National Science Foundation (HOMINID grant 1032255). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (SS); [email protected] (SP); [email protected] (DR) Introduction A much-debated question in human evolution is the relationship between modern humans and Neandertals. Modern humans appear in the African fossil record about 200,000 years ago. Neandertals appear in the European fossil record about 230,000 years ago [1] and disappear about 30,000 year ago. They lived in Europe and western Asia with a range that extended as far east as Siberia [2] and as far south as the middle East. The overlap of Neandertals and modern humans in space and time suggests the possibility of interbreeding. Evidence, both for [3] and against interbreeding [4], have been put forth based on the analysis of modern human DNA. Although mitochondrial DNA from multiple Neandertals has shown that Neandertals fall outside the range of modern human variation [5,6,7,8,9,10], low-levels of gene flow cannot be excluded [10,11,12]. Analysis of the draft sequence of the Neandertal genome revealed that the Neandertal genome shares more alleles with non- African than with sub-Saharan African genomes [13]. One hypothesis that could explain this observation is a history of gene flow from Neandertals into modern humans, presumably when they encountered each other in Europe and the Middle East [13] (Figure 1). An alternative hypothesis is that the findings are substructure in Africa is a plausible alternative to the hypothesis of recent gene flow. Today, sub-Saharan Africans harbor deep lineages that are consistent with a highly-structured ancestral population [17,18,19,20,21,22,23,24,25,26,27]. Evidence for an- cient structure in Africa has also been offered based on the substantial diversity in neurocranial geometry amongst early modern humans [28]. Thus, it is important to test formally whether substructure could explain the genetic evidence for Neandertals being more closely related to non-Africans than to Africans. A direct way to distinguish the hypothesis of recent gene flow from the hypothesis of ancient substructure is to infer the date for when the ancestors of Neandertals and a modern non-African population last exchanged genes. In the recent gene flow scenario, the date is not expected to be much older than 100,000 years ago, corresponding to the time of the earliest documented modern humans outside of Africa [29]. In the ancient substructure scenario, the date of last common ancestry is expected to be at least 230,000 years ago, since Neandertals must have separated from modern humans by that time based on the Neandertal fossil record of Europe [1]. In present-day human populations, the extent of LD between two single nucleotide polymorphisms (SNPs) shared with Nean- 2015

Slide 21

Slide 21 text

nucleotide polymorphisms (SNPs) shared with Neandertals will thus reflect, at least in part, the time since Neandertals or their ancestors and modern humans or their ancestors last exchanged genes with each other. genetic distance x (expected number of crossover reco events per meiosis) apart, arose on the Neandertal lin introgressed into modern humans at time tGF , the proba these alleles have not been broken up by recombination The Date of Interbreeding between Neandertals and Modern Humans Sriram Sankararaman1,2*, Nick Patterson2, Heng Li2, Svante Pa ¨a ¨bo3*, David Reich1,2* 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 3 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany Abstract Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa. Citation: Sankararaman S, Patterson N, Li H, Pa ¨a ¨bo S, Reich D (2012) The Date of Interbreeding between Neandertals and Modern Humans. PLoS Genet 8(10): e1002947. doi:10.1371/journal.pgen.1002947 Editor: Joshua M. Akey, University of Washington, United States of America Received December 15, 2011; Accepted July 27, 2012; Published October 4, 2012 Copyright: ß 2012 Sankararaman et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Presidential Innovation Fund of the Max Planck Society, the Krekeler Foundation, and the National Science Foundation (HOMINID grant 1032255). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (SS); [email protected] (SP); [email protected] (DR) Introduction A much-debated question in human evolution is the relationship between modern humans and Neandertals. Modern humans appear in the African fossil record about 200,000 years ago. Neandertals appear in the European fossil record about 230,000 years ago [1] and disappear about 30,000 year ago. They lived in Europe and western Asia with a range that extended as far east as Siberia [2] and as far south as the middle East. The overlap of Neandertals and modern humans in space and time suggests the possibility of interbreeding. Evidence, both for [3] and against interbreeding [4], have been put forth based on the analysis of modern human DNA. Although mitochondrial DNA from multiple Neandertals has shown that Neandertals fall outside the range of modern human variation [5,6,7,8,9,10], low-levels of gene flow cannot be excluded [10,11,12]. Analysis of the draft sequence of the Neandertal genome revealed that the Neandertal genome shares more alleles with non- African than with sub-Saharan African genomes [13]. One hypothesis that could explain this observation is a history of gene flow from Neandertals into modern humans, presumably when they encountered each other in Europe and the Middle East [13] (Figure 1). An alternative hypothesis is that the findings are substructure in Africa is a plausible alternative to the hypothesis of recent gene flow. Today, sub-Saharan Africans harbor deep lineages that are consistent with a highly-structured ancestral population [17,18,19,20,21,22,23,24,25,26,27]. Evidence for an- cient structure in Africa has also been offered based on the substantial diversity in neurocranial geometry amongst early modern humans [28]. Thus, it is important to test formally whether substructure could explain the genetic evidence for Neandertals being more closely related to non-Africans than to Africans. A direct way to distinguish the hypothesis of recent gene flow from the hypothesis of ancient substructure is to infer the date for when the ancestors of Neandertals and a modern non-African population last exchanged genes. In the recent gene flow scenario, the date is not expected to be much older than 100,000 years ago, corresponding to the time of the earliest documented modern humans outside of Africa [29]. In the ancient substructure scenario, the date of last common ancestry is expected to be at least 230,000 years ago, since Neandertals must have separated from modern humans by that time based on the Neandertal fossil record of Europe [1]. In present-day human populations, the extent of LD between two single nucleotide polymorphisms (SNPs) shared with Nean- 2015

Slide 22

Slide 22 text

Allenopithecus Cercopithecus Erythrocebus Cheirogaleus Daubentonia Eulemur Hapalemur Lemur Lepilemur Microcebus Mirza Propithecus Varecia Alouatta Ateles Lagothrix Aotus Callimico Callithrix Cebus Leontopithecus Saguinus Saimiri Arctocebus Loris Nycticebus Perodicticus Gorilla Hylobates Pan Pongo Symphalangus Cacajao Chiropotes Pithecia Cercocebus Lophocebus Macaca Mandrillus Papio Theropithecus Colobus Nasalis Presbytis Pygathrix Semnopithecus Trachypithecus Euoticus Galago Galagoides Otolemur Tarsius Social learning Brain volume Group size Longevity A B C data on social learning, absolute brain volume, group size, and longevity for 52 primate genera, usin Street et al 2017. Coevolution of cultural intelligence, extended life history, sociality, & brain size in primates

Slide 23

Slide 23 text

Photo: Dr Martin Muller

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

H. habilis 0 20 40 60 0 0.5 1.0 1.5 0 10 20 30 40 0 10 20 30 40 Adult fit: –0.04 Body (kg): EQ: 32.0 3.85 Age (years) Age (years) Body weight (kg) Brain weight (kg) ns Growth Reproduction Adulthood Childhood Adolescence 60% competitive Adult Adult est-fitting scenarios across adult Homo, and resulting d life history for H. sapiens. a, Best adult fitting scenarios across gure modified with permission from figure 8.1 of ref. 1). Pie d plots respectively show the challenge combination and shape brain body González-Forero & Gardner 2018. Ecological and social drivers of human brain-size evolution

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

7 29 1 2 3 6 4 5 10 9 12 13 15 14 27 35 31 34 33 32 36 37 39 40 38 28 26 25 24 21 16 19 23 20 18 17 22 8 11 30

Slide 31

Slide 31 text

Dr Jeremy Koster (left) Data collection Data processing Statistics

Slide 32

Slide 32 text

᭧ 2009 by The Wenner-Gren Foundation for Anthropological Research. All rights reserved. DOI: 10.1086/597981 Supplement A from Hill and Kintigh, “Can Anthropologists Distinguish Good and Poor Hunters? Implications for Hunting Hypotheses, Sharing Conventions, and Cultural Transmission” (Current Anthropology, vol. 50, no. 3, p. 369) Online Figures 1978 2009 Figure A2. A, Photo taken in January 1978; B, photo taken in January 2009. Reports Can Anthropologists Distinguish Good and Poor Hunters? Implications for Hunting Hypotheses, Sharing Conventions, and Cultural Transmission Kim Hill and Keith Kintigh From:

Slide 33

Slide 33 text

Life History of Production Skill • Goals • Estimate skill development • Take variation seriously • Develop stats machinery • Sample • 40 research sites • 1821 individuals • 21,160 foraging trips • 23,747 harvests • Uncountable headaches

Slide 34

Slide 34 text

Bad Data Headaches • Missing values: age, duration, technology • Uncertainty: age bins, age estimates • Massive imbalance in sampling

Slide 35

Slide 35 text

“Pooh?” said Piglet. “Yes, Piglet?” said Pooh. “27417 parameters,” said Piglet. “Oh, bother,” said Pooh.

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Scaffolds • A month of modeling & fake data simulation (priors) • Submit grant (wait for it...), get grant! • Develop first Stan model (multilevel) • Simulate missingness and uncertainty • Develop second Stan model (imputation, marginalization) • Analyze real sample

Slide 40

Slide 40 text

What Happened

Slide 41

Slide 41 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364) República del Paraguay Tetã Paraguái

Slide 42

Slide 42 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 43

Slide 43 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 44

Slide 44 text

0 40 80 37 16 ACH - skill 147 (14364) 0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)

Slide 45

Slide 45 text

0 18 40 55 80 Global mean 0 40 80 26 0 --KEY-- ids (obs) peak obs range 0 40 80 31 1 CRE 16 (127) 0 40 80 34 2 MYA 59 (464) 0 40 80 34 3 MYN 52 (359) 0 40 80 29 4 QUI 32 (189) 0 40 80 32 5 ECH 2 (6) 0 40 80 31 6 WAO 48 (373) 0 40 80 26 7 BAR 18 (233) 0 40 80 26 8 INU 15 (29) 0 40 80 18 9 MTS 69 (1441) 0 40 80 26 10 PIR 42 (274) 0 40 80 33 11 CLB 12 (45) 0 40 80 23 12 PME 23 (172) 0 40 80 31 13 TS1 29 (127) 0 40 80 31 14 TS2 37 (139) 0 40 80 37 15 TS3 168 (793) 0 40 80 37 16 ACH 147 (14364) 0 40 80 36 17 GB1 69 (488) 0 40 80 30 18 GB2 4 (37) 0 40 80 30 19 GB3 16 (73) 0 40 80 35 20 CN1 6 (80) 0 40 80 34 21 GB4 19 (114) 0 40 80 30 22 BK1 80 (249) 0 40 80 24 23 BK2 57 (114) 0 40 80 36 24 CN2 14 (78) 0 40 80 36 25 CN3 15 (67) 0 40 80 25 26 BFA 59 (433) 0 40 80 27 27 CN4 14 (287) 0 40 80 51 28 BIS 24 (231) 0 40 80 31 29 HEH 45 (45) 0 40 80 33 30 DLG 26 (76) 0 40 80 38 31 BTK 27 (268) 0 40 80 30 32 PN1 35 (119) 0 40 80 31 33 PN2 23 (125) 0 40 80 28 34 AGT 44 (211) 0 40 80 31 35 MRT 77 (758) 0 40 80 32 36 NUA 36 (140) 0 40 80 42 37 NIM 26 (180) 0 40 80 32 38 NEN 7 (7) 0 40 80 33 39 MAR 6 (28) 0 40 80 18 40 WOL 27 (410) Skill (mean)

Slide 46

Slide 46 text

Bad Data in Search of Good Home • Evolution is not your friend • Not experimental science • Null models useless • Data scarce, not simple • Flexible, scalable, open tools • Community crucial

Slide 47

Slide 47 text

No content