Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bad Data, Big Models, & Stat Methods for Studying Evolution

Bad Data, Big Models, & Stat Methods for Studying Evolution

Presented at StanCon 2018 Helsinki

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

August 30, 2018
Tweet

Transcript

  1. Bad Data, Big Models & Statistical Methods for Studying Evolution

    Richard McElreath Max Planck Institute for Evolutionary Anthropology Leipzig
  2. Elephant embryo, 16 weeks

  3. Sturnus vulgaris

  4. None
  5. Homo sapiens

  6. None
  7. Darwin 1837

  8. (a) (b) Fig. 5. Depiction of the regression phenomenon, based

    on a figure from Galton (1889 (a) the expected landing position of pellets from a specific upper level compartment a point of origin of pellets landing in a specific lower level compartment There was a hint in the lecture that Galton was moving towards a mechanism his thinking. He seemed to approach Mendelian genetics when he speculated Francis Galton builds Bayesian conditional distributions (1873, 1889). See Stigler (2010): Darwin, Galton and the Statistical Enlightenment
  9. Rothamsted Experimental Station

  10. Photo: Dr Martin Muller

  11. Statistical Inference in Evolution & Ecology • Fundamentally observational •

    Data: Bad, scarce & derived • Null models not unique • Plausible models custom & hungry • Result: Hi-D headaches, even with small data • Need: Flexible, scalable tools
  12. Outline • The evolution of statistical methods for studying evolution

    • Bad data in a big Stan model
  13. James Crow, R.A.Fisher, Motoo Kimura in 1961

  14. Kimura, M. 1983. The Neutral Theory of Molecular Evolution Neutral

    theory Data Uniform mutation Varying mutation
  15. X → Y,Z Hypotheses Process models Statistical models

  16. H0 H1 “Evolution is neutral” “Selection matters” Hypotheses Process models

    Statistical models
  17. H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, non-equilibrium

    P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection Hypotheses Process models Statistical models
  18. H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, non-equilibrium

    P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MII MI MIII Hypotheses Process models Statistical models
  19. None
  20. nucleotide polymorphisms (SNPs) shared with Neandertals will thus reflect, at

    least in part, the time since Neandertals or their ancestors and modern humans or their ancestors last exchanged genes with each other. genetic distance x (expected number of crossover reco events per meiosis) apart, arose on the Neandertal lin introgressed into modern humans at time tGF , the proba these alleles have not been broken up by recombination The Date of Interbreeding between Neandertals and Modern Humans Sriram Sankararaman1,2*, Nick Patterson2, Heng Li2, Svante Pa ¨a ¨bo3*, David Reich1,2* 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 3 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany Abstract Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa. Citation: Sankararaman S, Patterson N, Li H, Pa ¨a ¨bo S, Reich D (2012) The Date of Interbreeding between Neandertals and Modern Humans. PLoS Genet 8(10): e1002947. doi:10.1371/journal.pgen.1002947 Editor: Joshua M. Akey, University of Washington, United States of America Received December 15, 2011; Accepted July 27, 2012; Published October 4, 2012 Copyright: ß 2012 Sankararaman et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Presidential Innovation Fund of the Max Planck Society, the Krekeler Foundation, and the National Science Foundation (HOMINID grant 1032255). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: sankararaman@genetics.med.harvard.edu (SS); paabo@eva.mpg.de (SP); reich@genetics.med.harvard.edu (DR) Introduction A much-debated question in human evolution is the relationship between modern humans and Neandertals. Modern humans appear in the African fossil record about 200,000 years ago. Neandertals appear in the European fossil record about 230,000 years ago [1] and disappear about 30,000 year ago. They lived in Europe and western Asia with a range that extended as far east as Siberia [2] and as far south as the middle East. The overlap of Neandertals and modern humans in space and time suggests the possibility of interbreeding. Evidence, both for [3] and against interbreeding [4], have been put forth based on the analysis of modern human DNA. Although mitochondrial DNA from multiple Neandertals has shown that Neandertals fall outside the range of modern human variation [5,6,7,8,9,10], low-levels of gene flow cannot be excluded [10,11,12]. Analysis of the draft sequence of the Neandertal genome revealed that the Neandertal genome shares more alleles with non- African than with sub-Saharan African genomes [13]. One hypothesis that could explain this observation is a history of gene flow from Neandertals into modern humans, presumably when they encountered each other in Europe and the Middle East [13] (Figure 1). An alternative hypothesis is that the findings are substructure in Africa is a plausible alternative to the hypothesis of recent gene flow. Today, sub-Saharan Africans harbor deep lineages that are consistent with a highly-structured ancestral population [17,18,19,20,21,22,23,24,25,26,27]. Evidence for an- cient structure in Africa has also been offered based on the substantial diversity in neurocranial geometry amongst early modern humans [28]. Thus, it is important to test formally whether substructure could explain the genetic evidence for Neandertals being more closely related to non-Africans than to Africans. A direct way to distinguish the hypothesis of recent gene flow from the hypothesis of ancient substructure is to infer the date for when the ancestors of Neandertals and a modern non-African population last exchanged genes. In the recent gene flow scenario, the date is not expected to be much older than 100,000 years ago, corresponding to the time of the earliest documented modern humans outside of Africa [29]. In the ancient substructure scenario, the date of last common ancestry is expected to be at least 230,000 years ago, since Neandertals must have separated from modern humans by that time based on the Neandertal fossil record of Europe [1]. In present-day human populations, the extent of LD between two single nucleotide polymorphisms (SNPs) shared with Nean- 2015
  21. nucleotide polymorphisms (SNPs) shared with Neandertals will thus reflect, at

    least in part, the time since Neandertals or their ancestors and modern humans or their ancestors last exchanged genes with each other. genetic distance x (expected number of crossover reco events per meiosis) apart, arose on the Neandertal lin introgressed into modern humans at time tGF , the proba these alleles have not been broken up by recombination The Date of Interbreeding between Neandertals and Modern Humans Sriram Sankararaman1,2*, Nick Patterson2, Heng Li2, Svante Pa ¨a ¨bo3*, David Reich1,2* 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 3 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany Abstract Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa. Citation: Sankararaman S, Patterson N, Li H, Pa ¨a ¨bo S, Reich D (2012) The Date of Interbreeding between Neandertals and Modern Humans. PLoS Genet 8(10): e1002947. doi:10.1371/journal.pgen.1002947 Editor: Joshua M. Akey, University of Washington, United States of America Received December 15, 2011; Accepted July 27, 2012; Published October 4, 2012 Copyright: ß 2012 Sankararaman et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Presidential Innovation Fund of the Max Planck Society, the Krekeler Foundation, and the National Science Foundation (HOMINID grant 1032255). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: sankararaman@genetics.med.harvard.edu (SS); paabo@eva.mpg.de (SP); reich@genetics.med.harvard.edu (DR) Introduction A much-debated question in human evolution is the relationship between modern humans and Neandertals. Modern humans appear in the African fossil record about 200,000 years ago. Neandertals appear in the European fossil record about 230,000 years ago [1] and disappear about 30,000 year ago. They lived in Europe and western Asia with a range that extended as far east as Siberia [2] and as far south as the middle East. The overlap of Neandertals and modern humans in space and time suggests the possibility of interbreeding. Evidence, both for [3] and against interbreeding [4], have been put forth based on the analysis of modern human DNA. Although mitochondrial DNA from multiple Neandertals has shown that Neandertals fall outside the range of modern human variation [5,6,7,8,9,10], low-levels of gene flow cannot be excluded [10,11,12]. Analysis of the draft sequence of the Neandertal genome revealed that the Neandertal genome shares more alleles with non- African than with sub-Saharan African genomes [13]. One hypothesis that could explain this observation is a history of gene flow from Neandertals into modern humans, presumably when they encountered each other in Europe and the Middle East [13] (Figure 1). An alternative hypothesis is that the findings are substructure in Africa is a plausible alternative to the hypothesis of recent gene flow. Today, sub-Saharan Africans harbor deep lineages that are consistent with a highly-structured ancestral population [17,18,19,20,21,22,23,24,25,26,27]. Evidence for an- cient structure in Africa has also been offered based on the substantial diversity in neurocranial geometry amongst early modern humans [28]. Thus, it is important to test formally whether substructure could explain the genetic evidence for Neandertals being more closely related to non-Africans than to Africans. A direct way to distinguish the hypothesis of recent gene flow from the hypothesis of ancient substructure is to infer the date for when the ancestors of Neandertals and a modern non-African population last exchanged genes. In the recent gene flow scenario, the date is not expected to be much older than 100,000 years ago, corresponding to the time of the earliest documented modern humans outside of Africa [29]. In the ancient substructure scenario, the date of last common ancestry is expected to be at least 230,000 years ago, since Neandertals must have separated from modern humans by that time based on the Neandertal fossil record of Europe [1]. In present-day human populations, the extent of LD between two single nucleotide polymorphisms (SNPs) shared with Nean- 2015
  22. Allenopithecus Cercopithecus Erythrocebus Cheirogaleus Daubentonia Eulemur Hapalemur Lemur Lepilemur Microcebus

    Mirza Propithecus Varecia Alouatta Ateles Lagothrix Aotus Callimico Callithrix Cebus Leontopithecus Saguinus Saimiri Arctocebus Loris Nycticebus Perodicticus Gorilla Hylobates Pan Pongo Symphalangus Cacajao Chiropotes Pithecia Cercocebus Lophocebus Macaca Mandrillus Papio Theropithecus Colobus Nasalis Presbytis Pygathrix Semnopithecus Trachypithecus Euoticus Galago Galagoides Otolemur Tarsius Social learning Brain volume Group size Longevity A B C data on social learning, absolute brain volume, group size, and longevity for 52 primate genera, usin Street et al 2017. Coevolution of cultural intelligence, extended life history, sociality, & brain size in primates
  23. Photo: Dr Martin Muller

  24. None
  25. H. habilis 0 20 40 60 0 0.5 1.0 1.5

    0 10 20 30 40 0 10 20 30 40 Adult fit: –0.04 Body (kg): EQ: 32.0 3.85 Age (years) Age (years) Body weight (kg) Brain weight (kg) ns Growth Reproduction Adulthood Childhood Adolescence 60% competitive Adult Adult est-fitting scenarios across adult Homo, and resulting d life history for H. sapiens. a, Best adult fitting scenarios across gure modified with permission from figure 8.1 of ref. 1). Pie d plots respectively show the challenge combination and shape brain body González-Forero & Gardner 2018. Ecological and social drivers of human brain-size evolution
  26. None
  27. None
  28. None
  29. None
  30. 7 29 1 2 3 6 4 5 10 9

    12 13 15 14 27 35 31 34 33 32 36 37 39 40 38 28 26 25 24 21 16 19 23 20 18 17 22 8 11 30
  31. Dr Jeremy Koster (left) Data collection Data processing Statistics

  32. ᭧ 2009 by The Wenner-Gren Foundation for Anthropological Research. All

    rights reserved. DOI: 10.1086/597981 Supplement A from Hill and Kintigh, “Can Anthropologists Distinguish Good and Poor Hunters? Implications for Hunting Hypotheses, Sharing Conventions, and Cultural Transmission” (Current Anthropology, vol. 50, no. 3, p. 369) Online Figures 1978 2009 Figure A2. A, Photo taken in January 1978; B, photo taken in January 2009. Reports Can Anthropologists Distinguish Good and Poor Hunters? Implications for Hunting Hypotheses, Sharing Conventions, and Cultural Transmission Kim Hill and Keith Kintigh From:
  33. Life History of Production Skill • Goals • Estimate skill

    development • Take variation seriously • Develop stats machinery • Sample • 40 research sites • 1821 individuals • 21,160 foraging trips • 23,747 harvests • Uncountable headaches
  34. Bad Data Headaches • Missing values: age, duration, technology •

    Uncertainty: age bins, age estimates • Massive imbalance in sampling
  35. “Pooh?” said Piglet. “Yes, Piglet?” said Pooh. “27417 parameters,” said

    Piglet. “Oh, bother,” said Pooh.
  36. None
  37. None
  38. None
  39. Scaffolds • A month of modeling & fake data simulation

    (priors) • Submit grant (wait for it...), get grant! • Develop first Stan model (multilevel) • Simulate missingness and uncertainty • Develop second Stan model (imputation, marginalization) • Analyze real sample
  40. What Happened

  41. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364) República del Paraguay Tetã Paraguái
  42. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  43. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  44. 0 40 80 37 16 ACH - skill 147 (14364)

    0 40 80 38 16 ACH - success 147 (14364) 0 40 80 38 16 ACH - harvest 147 (14364) 0 40 80 38 16 ACH - production 147 (14364)
  45. 0 18 40 55 80 Global mean 0 40 80

    26 0 --KEY-- ids (obs) peak obs range 0 40 80 31 1 CRE 16 (127) 0 40 80 34 2 MYA 59 (464) 0 40 80 34 3 MYN 52 (359) 0 40 80 29 4 QUI 32 (189) 0 40 80 32 5 ECH 2 (6) 0 40 80 31 6 WAO 48 (373) 0 40 80 26 7 BAR 18 (233) 0 40 80 26 8 INU 15 (29) 0 40 80 18 9 MTS 69 (1441) 0 40 80 26 10 PIR 42 (274) 0 40 80 33 11 CLB 12 (45) 0 40 80 23 12 PME 23 (172) 0 40 80 31 13 TS1 29 (127) 0 40 80 31 14 TS2 37 (139) 0 40 80 37 15 TS3 168 (793) 0 40 80 37 16 ACH 147 (14364) 0 40 80 36 17 GB1 69 (488) 0 40 80 30 18 GB2 4 (37) 0 40 80 30 19 GB3 16 (73) 0 40 80 35 20 CN1 6 (80) 0 40 80 34 21 GB4 19 (114) 0 40 80 30 22 BK1 80 (249) 0 40 80 24 23 BK2 57 (114) 0 40 80 36 24 CN2 14 (78) 0 40 80 36 25 CN3 15 (67) 0 40 80 25 26 BFA 59 (433) 0 40 80 27 27 CN4 14 (287) 0 40 80 51 28 BIS 24 (231) 0 40 80 31 29 HEH 45 (45) 0 40 80 33 30 DLG 26 (76) 0 40 80 38 31 BTK 27 (268) 0 40 80 30 32 PN1 35 (119) 0 40 80 31 33 PN2 23 (125) 0 40 80 28 34 AGT 44 (211) 0 40 80 31 35 MRT 77 (758) 0 40 80 32 36 NUA 36 (140) 0 40 80 42 37 NIM 26 (180) 0 40 80 32 38 NEN 7 (7) 0 40 80 33 39 MAR 6 (28) 0 40 80 18 40 WOL 27 (410) Skill (mean)
  46. Bad Data in Search of Good Home • Evolution is

    not your friend • Not experimental science • Null models useless • Data scarce, not simple • Flexible, scalable, open tools • Community crucial
  47. None