Elephants in the Room: Challenges in inferring demographic history

Ryan Gutenkunst University of Arizona Elephants in the Room Challenges
in inferring demographic history http://gutengroup.mcb.arizona.edu @RyanGutenkunst

Demographic history deMenocal (2016) Nature

Inferring demographic history (Metaphor suggested by Richard Durbin)

Inferring demographic history (Metaphor suggested by Richard Durbin) Multidimensional SNP
Data Gutenkunst (2009) PLoS Genet Jouganous (2017) Genetics

Data Gutenkunst (2009) PLoS Genet Jouganous (2017) Genetics Schiffels (2014) Nature Genet

Data Gutenkunst (2009) PLoS Genet Jouganous (2017) Genetics tracts. The results, plotted in Figure 6A, show no significant difference between the average recombination rate within long IBS tracts versus short ones. If recombination hotspots significantly reduced the frequency of long IBS tracts compared to what we would expect under the assumption of constant recombination rate, then the longest observed IBS tracts should span regions of lower-than-average recombination rate; conversely, if recombination hotspots significantly increased the frequency of short IBS tracts, we would expect to see short tracts concentrated in regions of higher-than-average recombination rate. We observed neither of these patterns and therefore made no special effort to correct for recombination rate variation. Li and Durbin made a similar decision with regard to the PSMC, which can accurately infer past population sizes from data with simulated recombination hotspots. To judge whether non-uniformity of the mutation rate was biasing the IBS tract spectrum, we computed the frequency of human/chimp fixed differences within IBS tracts of length L. We observed that short IBS tracts of v100 bp are concentrated in regions with elevated rates of human-chimp substitution, suggesting that mutation rate variation has a significant impact on this part of the IBS tract spectrum. IBS tracts shorter than 5 base pairs long are dispersed fairly evenly throughout the genome, but human-chimp fixed differences cover more than 10% of the sites they span (see Figure 6B) as opposed to 1% of the genome overall. In Hodgkinson, et al.’s study of cryptic human mutation rate variation, they estimated that the rate of coincidence between human and chimp polymorphisms could be explained by 0.1% of sites having a mutation rate that was 33 times the mutation rate at other sites [52]. We modified our method to reflect this correction when analyzing real human data, assuming that a uniformly distributed 0.1% of sites have a scaled mutation rate of h’~0:033, elevated above a baseline value of h~0:001. We also excluded IBS tracts shorter than 100 base pairs from all computed likelihood functions (see Methods for more detail). Human demography and the migration out of Africa conflicting models of human evolution that have been proposed in recent years. Two of these models were obtained from SFS data using the method LaLi of Gutenkunst, et al.; these models are identically parameterized but differ in specific parameter estimates, which were inferred from different datasets. One model was fit to Table 1. Inferring the parameters of a simple admixture scenario. ta (gens) ts (gens) f N True value: 400 2,000 0.05 10,000 Mean: 431 1,990 0.0505 9,806 Std dev: 51 41 0.00652 27 Bias: 31 210 0.0005 2194 Mean squared error: 3280 1781 4:27|10{5 3:84|104 True value: 200 2,000 0.05 10,000 Mean: 220 1,983 0.0499 10,003 Std dev: 28 39 0.00328 287 Bias: 20 217 20.0001 23 Mean squared error: 1184 1810 1:08|10{5 8:23|104 Using MS, we simulated 200 replicates of the admixture scenario depicted in Figure 2B. In 100 replicates, the gene flow occurred 400 generations ago, while in the other 100 replicates it occurred 200 generations ago. Our estimates of the four parameters ta,ts,f ,N are consistently close to the true values, showing that we are able distinguish the two histories by numerically optimizing the likelihood function. doi:10.1371/journal.pgen.1003521.t001 Figure 4. Frequencies of IBS tracts shared between the 1000 Genomes trio parental haplotypes. Each plot records the number of L-base IBS tracts observed per base pair of sequence alignment. The red spectrum records tract frequencies compiled from the entire alignment, while the blue spectra result from 100 repetitions of block bootstrap resampling. A slight upward concavity around 104 base pairs is the signature of the out of Africa bottleneck in Europeans. doi:10.1371/journal.pgen.1003521.g004 Inferring Demography from Shared Haplotype Lengths Harris (2013) PLoS Genet Schiffels (2014) Nature Genet

Data Gutenkunst (2009) PLoS Genet Jouganous (2017) Genetics tracts. The results, plotted in Figure 6A, show no significant difference between the average recombination rate within long IBS tracts versus short ones. If recombination hotspots significantly reduced the frequency of long IBS tracts compared to what we would expect under the assumption of constant recombination rate, then the longest observed IBS tracts should span regions of lower-than-average recombination rate; conversely, if recombination hotspots significantly increased the frequency of short IBS tracts, we would expect to see short tracts concentrated in regions of higher-than-average recombination rate. We observed neither of these patterns and therefore made no special effort to correct for recombination rate variation. Li and Durbin made a similar decision with regard to the PSMC, which can accurately infer past population sizes from data with simulated recombination hotspots. To judge whether non-uniformity of the mutation rate was biasing the IBS tract spectrum, we computed the frequency of human/chimp fixed differences within IBS tracts of length L. We observed that short IBS tracts of v100 bp are concentrated in regions with elevated rates of human-chimp substitution, suggesting that mutation rate variation has a significant impact on this part of the IBS tract spectrum. IBS tracts shorter than 5 base pairs long are dispersed fairly evenly throughout the genome, but human-chimp fixed differences cover more than 10% of the sites they span (see Figure 6B) as opposed to 1% of the genome overall. In Hodgkinson, et al.’s study of cryptic human mutation rate variation, they estimated that the rate of coincidence between human and chimp polymorphisms could be explained by 0.1% of sites having a mutation rate that was 33 times the mutation rate at other sites [52]. We modified our method to reflect this correction when analyzing real human data, assuming that a uniformly distributed 0.1% of sites have a scaled mutation rate of h’~0:033, elevated above a baseline value of h~0:001. We also excluded IBS tracts shorter than 100 base pairs from all computed likelihood functions (see Methods for more detail). Human demography and the migration out of Africa conflicting models of human evolution that have been proposed in recent years. Two of these models were obtained from SFS data using the method LaLi of Gutenkunst, et al.; these models are identically parameterized but differ in specific parameter estimates, which were inferred from different datasets. One model was fit to Table 1. Inferring the parameters of a simple admixture scenario. ta (gens) ts (gens) f N True value: 400 2,000 0.05 10,000 Mean: 431 1,990 0.0505 9,806 Std dev: 51 41 0.00652 27 Bias: 31 210 0.0005 2194 Mean squared error: 3280 1781 4:27|10{5 3:84|104 True value: 200 2,000 0.05 10,000 Mean: 220 1,983 0.0499 10,003 Std dev: 28 39 0.00328 287 Bias: 20 217 20.0001 23 Mean squared error: 1184 1810 1:08|10{5 8:23|104 Using MS, we simulated 200 replicates of the admixture scenario depicted in Figure 2B. In 100 replicates, the gene flow occurred 400 generations ago, while in the other 100 replicates it occurred 200 generations ago. Our estimates of the four parameters ta,ts,f ,N are consistently close to the true values, showing that we are able distinguish the two histories by numerically optimizing the likelihood function. doi:10.1371/journal.pgen.1003521.t001 Figure 4. Frequencies of IBS tracts shared between the 1000 Genomes trio parental haplotypes. Each plot records the number of L-base IBS tracts observed per base pair of sequence alignment. The red spectrum records tract frequencies compiled from the entire alignment, while the blue spectra result from 100 repetitions of block bootstrap resampling. A slight upward concavity around 104 base pairs is the signature of the out of Africa bottleneck in Europeans. doi:10.1371/journal.pgen.1003521.g004 Inferring Demography from Shared Haplotype Lengths Harris (2013) PLoS Genet Figure 2. We offer a fast algorithm for sorting m 283 function: 284 https://github.com/flag0010/pop_gen_cnn/blob/m 285 rep.tricks.py). 286 287 Introgression detection 288 To detect introgression, we simulated train 289 (https://github.com/geneva/msmove) from the sa 290 (2018) used to train the FILET classifier for detec 291 and D. sechellia. In total we produced 237,500 co 292 without no migration between species (No Intro 293 simulans into D. sechellia (sim→sech), and 12,500 wit 294 (sech→sim). We used fewer sech→sim examples becau 295 that the network could detect this class fairly ac 296 sampling of the other two more challenging classes 297 training and validation sets so that the training set i 298 Chromosomes 0 10 20 0 20 40 0 20 40 0 Segregating Figure 2: Example population genetic alignments unsorted alignment matrix (left) and this same matrix s (right) are shown. Each row represents one of twenty represents one of forty segregating sites. Derived and a respectively. CC-BY 4.0 Int It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioR http://dx.doi.org/10. doi: bioRxiv preprint first posted online May. 31, 2018; Flagel (2018) bioRxiv Comparison with ABCtoolbox Although ABC is not well suited for our scenario of interest and deep learning is a complem tary method, we wanted to find a scenario where we could compare the performance of thes two methods. To this effect, we restricted the analysis to estimating (continuous) demograp parameters only. We used the popular ABCtoolbox [54], using the same training and testing datasets as for deep learning. For ABC, the training data represents the data simulated unde the prior distributions (uniform in our case), and each test dataset was compared with the training data separately. We retained 5% of the training datasets, and used half of these retai datasets for posterior density estimation. Overall, we used 75% of the datasets for training an 25% for testing. We tested two scenarios, one with the full set of summary statistics (345 total), and the other with a reduced set of summary statistics (100 total). For the reduced set of summary s tistics, we chose statistics which seemed to be informative: the number of segregating sites, Tajima’s D, the first 15 entries of site frequency spectrum, H1, and the distribution of distan Fig 5. A Venn diagram of most informative statistics for each output variable (N1 , N2 , N3 , and selection). For each variable, the top 25 statistics we chosen using permutation testing. The Venn diagram captures statistics common to each subset of output variables, with notable less informative statistic shown in the lower right. Close, mid, and far represent the genomic region where the statistic was calculated. The numbers after each colon refer to the position of the statistic within its distribution or order. For the SFS statistics, it is number of minor alleles. For each region, there are 50 SFS statistics, 16 B statistics (distribution between segregating sites), 30 IBS statistics, and 16 LD statistics. doi:10.1371/journal.pcbi.1004845.g005 Deep Learning for Population Genetic Infere PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004845 March 28, 2016 13 Sheehan (2016) PLoS Comput Biol Schiffels (2014) Nature Genet

st-fit demographic models and observed and predicted frequency spectra for
African farmer (Yoruba) and Pygmy (Baka an Demography and selection in Afric Case study 1: African Pygmies Hsieh (2016) Genome Res st-fit demographic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka an continuous asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture m Demography and selection in Afric Continuous gene ﬂow Pulse gene ﬂow

African farmer (Yoruba) and Pygmy (Baka an Demography and selection in Afric Case study 1: African Pygmies Hsieh (2016) Genome Res st-fit demographic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka an continuous asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture m Demography and selection in Afric 155 kya 90 kya Continuous gene ﬂow Pulse gene ﬂow

African farmer (Yoruba) and Pygmy (Baka an Demography and selection in Afric Case study 1: African Pygmies Hsieh (2016) Genome Res st-fit demographic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka an continuous asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture m Demography and selection in Afric 155 kya 90 kya Log-likelihood ~ -6712 Log-likelihood ~ -7737 Continuous gene ﬂow Pulse gene ﬂow

African farmer (Yoruba) and Pygmy (Baka an Demography and selection in Afric Case study 1: African Pygmies Hsieh (2016) Genome Res st-fit demographic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka an continuous asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture m Demography and selection in Afric 155 kya 90 kya Log-likelihood ~ -6712 Log-likelihood ~ -7737 rred that Pygmies and farmers diverged ∼90 kya a).Thepulseofgeneflowisestimatedtohaveoc- centiMorgan (cM) apart. The two best-fitting models rem the same as using the whole data set, and the parameter estim emographic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka and Biaka) nuous asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture model (Mo ameters labeled. (C) The marginal spectra for each pair of populations. Row one is data, rows two (Model-1) and four (Model- ee and five are Anscombe residuals of model minus data for Model-1 and Model-2, respectively. graphic models and observed and predicted frequency spectra for African farmer (Yoruba) and Pygmy (Baka and Biaka s asymmetric gene flow model (Model-1) with the 10 free parameters labeled. (B) The single-pulse admixture model (M ters labeled. (C) The marginal spectra for each pair of populations. Row one is data, rows two (Model-1) and four (Mode Demography and selection in African Py Continuous gene ﬂow Pulse gene ﬂow

Hsieh (2016) Genome Res Case study 1: African Pygmies

Hsieh (2016) Genome Res Continuous gene ﬂow Pulse gene ﬂow
Data LD Decay Case study 1: African Pygmies

Hsieh (2016) Genome Res Continuous gene flow Pulse gene flow
Data LD Decay (D) (E) Data  Continuous gene flow Pulse gene flow MSMC Case study 1: African Pygmies

Case study 2: Out of Africa Harris (2013) PLoS Genet

Case study 2: Out of Africa Model inferred from allele
frequencies predicts IBS tracts poorly. Harris (2013) PLoS Genet

Case study 2: Out of Africa Model inferred from allele
frequencies predicts IBS tracts poorly. Model inferred from IBS tracts predicts allele frequencies poorly. Harris (2013) PLoS Genet

Case study 3: Out of Africa Comparison of Single Genome
and Allele Frequency Data Reveals Discordant Demographic Histories Annabel C. Beichman,* Tanya N. Phung,† and Kirk E. Lohmueller*,†,‡,1 *Department of Ecology and Evolutionary Biology, †Interdepartmental Program in Bioinformatics, and ‡Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095 ORCID IDs: 0000-0002-6991-587X (A.C.B.); 0000-0003-2685-6977 (T.N.P.); 0000-0002-3874-369X (K.E.L.) ABSTRACT Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a KEYW pairw seq Ma coa site f spe popu gen demo infe nonm org Genomes Project Consortium 2015; Henn et al. 2 cently featured in three prominent articles that rec history using whole genome sequencing data from o (Malaspinas et al. 2016; Mallick et al. 2016; Pagani PSMC plots have also become a cornerstone nonmodel organisms lacking resources for the seque individuals, including archaic hominins (Meyer et a 2014), great apes (Prado-Martinez et al. 2013), wild pigs (Groenen et al. 2012; Bosse et al. 2014), canid published Early Online September 11, 2017. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supplemental material is available online at www.g3journal.org/lookup/suppl/ doi:10.1534/g3.117.300259/-/DC1. 1Corresponding author: Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 621 Charles E. Young Dr. South, Los Angeles, CA 90095- 1606. E-mail: [email protected] Volume 7 | Novem Genomes Project Consortium 2015; Henn et al. 2016), and were re- cently featured in three prominent articles that reconstructed human history using whole genome sequencing data from over 20 populations (Malaspinas et al. 2016; Mallick et al. 2016; Pagani et al. 2016). PSMC plots have also become a cornerstone of many studies of nonmodel organisms lacking resources for the sequencing of numerous individuals, including archaic hominins (Meyer et al. 2012; Prufer et al. 2014), great apes (Prado-Martinez et al. 2013), wild boars and domestic pigs (Groenen et al. 2012; Bosse et al. 2014), canids (Freedman et al. mber 1, 2017; the Creative mmons.org/ eproduction ookup/suppl/ gy, University es, CA 90095- Volume 7 | November 2017 | 3605

Model complexity

Model complexity “With four parameters I can ﬁt an elephant.
With ﬁve I can make him wiggle his trunk.” - John von Neumann

Model complexity “With four parameters I can fit an elephant.
With five I can make him wiggle his trunk.” - John von Neumann Drawing an elephant with four complex parameters Jürgen Mayer Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Khaled Khairy European Molecular Biology Laboratory, Meyerhofstraße. 1, 69117 Heidelberg, Germany Jonathon Howard Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany ͑Received 20 August 2008; accepted 5 October 2009͒ We define four complex numbers representing the parameters needed to specify an elephantine shape. The real and imaginary parts of these complex numbers are the coefficients of a Fourier coordinate expansion, a powerful tool for reducing the data required to define shapes. © 2010 American Association of Physics Teachers. ͓DOI: 10.1119/1.3254017͔ A turning point in Freeman Dyson’s life occurred during a meeting in the Spring of 1953 when Enrico Fermi criticized the complexity of Dyson’s model by quoting Johnny von Neumann:1 “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Since then it has become a well-known saying among physicists, but nobody has successfully implemented it. To parametrize an elephant, we note that its perimeter can be described as a set of points ͑x͑t͒,y͑t͒͒, where t is a parameter that can be interpreted as the elapsed time while going along the path of the contour. If the speed is uniform, t becomes the arc length. We expand x and y separately2 as a Fourier series ϱ trace out elliptical corrections analogous to Ptolemy’s epicycles.5 Visualization of the corresponding ellipses can be found at Ref. 6. We now use this tool to fit an elephant with four parameters. Wei7 tried this task in 1975 using a least-squares Fou- rier sine series but required about 30 terms. By analyzing the picture in Fig. 1͑a͒ and eliminating components with amplitudes less than 10% of the maximum amplitude, we obtained an approximate spectrum. The remaining amplitudes were Using this expansion of the x and y coordinates, we can analyze shapes by tracing the boundary and calculating the coefficients in the expansions ͑using standard methods from Fourier analysis͒. By truncating the expansion, the shape is smoothed. Truncation leads to a huge reduction in the information necessary to express a certain shape compared to a pixelated image, for example. Székely et al.3 used this approach to segment magnetic resonance imaging data. A similar approach was used to analyze the shapes of red blood cells,4 with a spherical harmonics expansion serving as a 3D generalization of the Fourier coordinate expansion. The coefficients represent the best fit to the given shape in the following sense. The k=0 component corresponds to the center of mass of the perimeter. The k=1 component corresponds to the best fit ellipse. The higher order components Fig. 1. ͑a͒ Outline of an elepha 648 Am. J. Phys. 78 ͑6͒, June 2010 http://aapt.org/ajp © 2010 American

Model complexity CHB YRI NAF N B NEU0 NAS0 rEU
rAS mAF-B mAF-EU mEU-AS (also mAF-AS) TAF TB TEU-AS CEU NA Gutenkunst (2009) PLoS Genetics fit significantly worse than the maximum likelihood model, with a composite log likelihood ratio of {11796. When we simulated data under the restricted model and inferred a full set of 14 parameters from the simulated data, these included a ghost admixture fraction of 0.01, the smallest fraction permitted by the optimization bounds. Given that models inferred from site frequency spectra do not fit the IBS tracts in human data, we simulated site frequency data under our inferred demographic model to see whether the reverse was true. The resulting spectrum had more population-private alleles than the NIEHS frequency spectrum previously analyzed by Gutenkunst, et al (see Section 4.2 of Text S1 and Supplemen- tary Figure S11). The discrepancy might result from biased population size estimates or from differences in the effects of errors on IBS tract and SFS data. Discussion IBS tracts shared between diverging populations contain a lot of information about split times and subsequent gene flow; we can distinguish not only between instantaneous isolation and isolation with subsequent migration, but between recent admixture events that occur at modestly different times. We can accurately estimate the times of simulated admixture events that occurred hundreds of generations ago, too old for migrant tracts to be identified as IBD with tracts from a foreign population. In addition, we can Figure 7. A history inferred from IBS sharing in Europeans and Yorubans. This is the simplest history we found to satisfactorily explain IBS tract sharing in the 1000 Genomes trio data. It includes ancient ancestral population size changes, an out-of-African bottleneck in Europeans, ghost admixture into Europe from an ancestral hominid, and a long period of gene flow between the diverging populations. doi:10.1371/journal.pgen.1003521.g007 Inferring Demography from Shared Haplotype Lengths PLOS Genetics | www.plosgenetics.org 9 June 2013 | Volume 9 | Issue 6 | e1003521 Harris (2013) PLoS Genetics

Model complexity CHB YRI NAF N B NEU0 NAS0 rEU
rAS mAF-B mAF-EU mEU-AS (also mAF-AS) TAF TB TEU-AS CEU NA Gutenkunst (2009) PLoS Genetics fit significantly worse than the maximum likelihood model, with a composite log likelihood ratio of {11796. When we simulated data under the restricted model and inferred a full set of 14 parameters from the simulated data, these included a ghost admixture fraction of 0.01, the smallest fraction permitted by the optimization bounds. Given that models inferred from site frequency spectra do not fit the IBS tracts in human data, we simulated site frequency data under our inferred demographic model to see whether the reverse was true. The resulting spectrum had more population-private alleles than the NIEHS frequency spectrum previously analyzed by Gutenkunst, et al (see Section 4.2 of Text S1 and Supplemen- tary Figure S11). The discrepancy might result from biased population size estimates or from differences in the effects of errors on IBS tract and SFS data. Discussion IBS tracts shared between diverging populations contain a lot of information about split times and subsequent gene flow; we can distinguish not only between instantaneous isolation and isolation with subsequent migration, but between recent admixture events that occur at modestly different times. We can accurately estimate the times of simulated admixture events that occurred hundreds of generations ago, too old for migrant tracts to be identified as IBD with tracts from a foreign population. In addition, we can Figure 7. A history inferred from IBS sharing in Europeans and Yorubans. This is the simplest history we found to satisfactorily explain IBS tract sharing in the 1000 Genomes trio data. It includes ancient ancestral population size changes, an out-of-African bottleneck in Europeans, ghost admixture into Europe from an ancestral hominid, and a long period of gene flow between the diverging populations. doi:10.1371/journal.pgen.1003521.g007 Inferring Demography from Shared Haplotype Lengths PLOS Genetics | www.plosgenetics.org 9 June 2013 | Volume 9 | Issue 6 | e1003521 Harris (2013) PLoS Genetics 5 terms 10 terms 20 terms 30 terms Wei (1975) Chemtech

When to stop adding model features?

• In (my) practice… Stop when new features no longer
qualitatively improve the residual plot or when they lead to poor optimization convergence. When to stop adding model features?

qualitatively improve the residual plot or when they lead to poor optimization convergence. • In theory… Stop when improvement in likelihood is not statistically significant, according to a likelihood ratio test.  (See Coffman (2016) Mol Biol Evol for LRTs with linked data.)    with an asymmetric size model, the distribution of log-likelihood differences was broader than expected (fig. 1F), leading the traditional test to erroneously favor the model with asymmetric sizes (fig. 1G). Using first-order moment matching, however, restored the expected distribution of log-likelihood differences (fig. 1F and supplementary text, Supplementary Material online), resulting in a well-controlled Type 1 error rate (fig. 1G). If the simpler model lies on the boundary of parameter space, such as comparing models with and without migration, the null distribution is more complex (supplementary materials and methods, Supplementary Material online), yet moment-matching still yielded a well-controlled error rate (fig. 1H). In many population genetics contexts, time equal to Wald and scor also be adjust RoyChoudhur that they p Supplementar To demons cal approache which models infer recent ge lihood, becaus each length in FIM underest A B E C F J K D FIG. 1. Adjusted composite-likelihood statistics compared with MLE on bootstrapped data and from MLE on bootstrapped data, in blue are from composite likelihood (GIM), and in red are assumption is incorrect when data are linked (/r 6¼ 0). (A, B) Inferred @a@i parameter standar population size change at a time T in the past. To vary the strength of linkage, the mutation ra was varied. Plotted are averages over 100 data sets per value of /r. (C, D) Coverage of 95% and (B). (E) Parameter standard deviations from Godambe and Fisher Information Matrices co and 13-parameter @a@i model of Gutenkunst et al. (2009). (F) For 100 symmetric migration dat (ÁLL) between asymmetric and symmetric migration @a@i models, before (red) and after distribution (black line). (G) Type I error rate versus significance level for LRT on simulations a (red) ÁLLs. (H) Type I error rate versus significance level for LRT between @a@i models of isol versus significance level for LRT between instantaneous growth and standard neutral @a@i mo by TRACTS for a model in which Europeans and African-Americans admixed T generations ag When to stop adding model features?

qualitatively improve the residual plot or when they lead to poor optimization convergence. • In theory… Stop when improvement in likelihood is not statistically significant, according to a likelihood ratio test.  (See Coffman (2016) Mol Biol Evol for LRTs with linked data.)    • But… With genome-scale data, adding parameters almost always yields a statistically significant improvement. with an asymmetric size model, the distribution of log-likelihood differences was broader than expected (fig. 1F), leading the traditional test to erroneously favor the model with asymmetric sizes (fig. 1G). Using first-order moment matching, however, restored the expected distribution of log-likelihood differences (fig. 1F and supplementary text, Supplementary Material online), resulting in a well-controlled Type 1 error rate (fig. 1G). If the simpler model lies on the boundary of parameter space, such as comparing models with and without migration, the null distribution is more complex (supplementary materials and methods, Supplementary Material online), yet moment-matching still yielded a well-controlled error rate (fig. 1H). In many population genetics contexts, time equal to Wald and scor also be adjust RoyChoudhur that they p Supplementar To demons cal approache which models infer recent ge lihood, becaus each length in FIM underest A B E C F J K D FIG. 1. Adjusted composite-likelihood statistics compared with MLE on bootstrapped data and from MLE on bootstrapped data, in blue are from composite likelihood (GIM), and in red are assumption is incorrect when data are linked (/r 6¼ 0). (A, B) Inferred @a@i parameter standar population size change at a time T in the past. To vary the strength of linkage, the mutation ra was varied. Plotted are averages over 100 data sets per value of /r. (C, D) Coverage of 95% and (B). (E) Parameter standard deviations from Godambe and Fisher Information Matrices co and 13-parameter @a@i model of Gutenkunst et al. (2009). (F) For 100 symmetric migration dat (ÁLL) between asymmetric and symmetric migration @a@i models, before (red) and after distribution (black line). (G) Type I error rate versus significance level for LRT on simulations a (red) ÁLLs. (H) Type I error rate versus significance level for LRT between @a@i models of isol versus significance level for LRT between instantaneous growth and standard neutral @a@i mo by TRACTS for a model in which Europeans and African-Americans admixed T generations ag When to stop adding model features?

qualitatively improve the residual plot or when they lead to poor optimization convergence. • In theory… Stop when improvement in likelihood is not statistically significant, according to a likelihood ratio test.  (See Coffman (2016) Mol Biol Evol for LRTs with linked data.)    • But… With genome-scale data, adding parameters almost always yields a statistically significant improvement. • Statistical noise is probably smaller than systematic biases. Fitting bias may drive model farther from truth. with an asymmetric size model, the distribution of log-likelihood differences was broader than expected (fig. 1F), leading the traditional test to erroneously favor the model with asymmetric sizes (fig. 1G). Using first-order moment matching, however, restored the expected distribution of log-likelihood differences (fig. 1F and supplementary text, Supplementary Material online), resulting in a well-controlled Type 1 error rate (fig. 1G). If the simpler model lies on the boundary of parameter space, such as comparing models with and without migration, the null distribution is more complex (supplementary materials and methods, Supplementary Material online), yet moment-matching still yielded a well-controlled error rate (fig. 1H). In many population genetics contexts, time equal to Wald and scor also be adjust RoyChoudhur that they p Supplementar To demons cal approache which models infer recent ge lihood, becaus each length in FIM underest A B E C F J K D FIG. 1. Adjusted composite-likelihood statistics compared with MLE on bootstrapped data and from MLE on bootstrapped data, in blue are from composite likelihood (GIM), and in red are assumption is incorrect when data are linked (/r 6¼ 0). (A, B) Inferred @a@i parameter standar population size change at a time T in the past. To vary the strength of linkage, the mutation ra was varied. Plotted are averages over 100 data sets per value of /r. (C, D) Coverage of 95% and (B). (E) Parameter standard deviations from Godambe and Fisher Information Matrices co and 13-parameter @a@i model of Gutenkunst et al. (2009). (F) For 100 symmetric migration dat (ÁLL) between asymmetric and symmetric migration @a@i models, before (red) and after distribution (black line). (G) Type I error rate versus significance level for LRT on simulations a (red) ÁLLs. (H) Type I error rate versus significance level for LRT between @a@i models of isol versus significance level for LRT between instantaneous growth and standard neutral @a@i mo by TRACTS for a model in which Europeans and African-Americans admixed T generations ag When to stop adding model features?

Potential systematic biases • SNP calling/ascertainment • Ancestral state misidentiﬁcation
• Phasing • Gene conversion • Polyploidy • Population structure • Background selection • Hitchhiking • •

Taming the elephant Note: Traditional techniques for taming elephants are
cruel.  I do not endorse their use on elephants, students, or postdocs.

Taming the elephant

Taming the elephant • Developers do test their methods on
simulated data.

simulated data. • But… we have no standards to test against, so we simulate our own data.

simulated data. • But… we have no standards to test against, so we simulate our own data. • Our simulations likely only include a few potential sources of bias - mostly ones our method is robust to.

simulated data. • But… we have no standards to test against, so we simulate our own data. • Our simulations likely only include a few potential sources of bias - mostly ones our method is robust to. • We know the truth our inference is aiming for.

Solution: Hidden truth

Solution: Competition

Scientiﬁc Challenges

Scientiﬁc Challenges • Critical Assessment of protein Structure Prediction  Since
1994, 13 competitions, 100s of competitors

Scientiﬁc Challenges • Critical Assessment of protein Structure Prediction  Since
1994, 13 competitions, 100s of competitors • Dialogue for Reverse Engineering Assessments and Methods

Challenge design What scenarios should be simulated?

Challenge design How should accuracy be measured?

Challenge design How to encourage broad participation?

Summary • Inferring demographic history is hard. • The relative
strengths of existing methods are unclear. • Developing complex models requires judgement and may be prone to overﬁtting. • Testing on blinded data may inform the community. • Testing as a competition may stimulate the community.

Elephants in the Room: Challenges in inferring ...

Elephants in the Room: Challenges in inferring demographic history

Other Decks in Science

Featured

Transcript