Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sequence Capture Overview, Biological Applications, &c.

Sequence Capture Overview, Biological Applications, &c.

Slide presentation supporting sequence capture workshop at Smithsonian MSC, January 16-20, 2012.

Brant Faircloth

January 29, 2012
Tweet

More Decks by Brant Faircloth

Other Decks in Science

Transcript

  1. The problem value of MPS You The Machine = 7

    days of 8 plates per day containing 5 mplex loci Illumina HiSeq 2000 100 PE Run 100X Depth - Each Amplicon
  2. The value of MPS 1.22 Years of: 95 samples per

    plate 5 multiplexed loci per plate 8 plates per day 7 days per week
  3. MPS Costs 3730xl 454 (Ti) HiSeq 2000 0 250 500

    750 1000 1,000.0000 26.0000 0.0001 Millions of reads per run 3730xl 454 (Ti) HiSeq 2000 0 375 750 1125 1500 $0.10 $15.00 $1,500.00 Cost per Megabase Source: Glenn 2011 Mol. Ecol. Res.
  4. Harnessing the Power • Sequence capture (today’s talk) • Genome

    sequencing • Amplicon generation and sequencing • RAD-tag and RAD-tag-like approaches • MIPS, EPIC, etc.
  5. • Library Prep and Sequence Tagging • Sequence Capture •

    Uses of Seqcap [Overview] • Uses of Seqcap [Specific Approaches] • Computational Issues Outline
  6. Adapter A synthetic DNA sequence for binding DNA to the

    sequencing substrate, which differs by platform
  7. Sequence Tagging CAGCAA CAGCAT CCGTAG Source: Binladen et al. 2007

    PLoS One Meyer et al. 2007 NAR Hamady et al. 2008 Nature Methods
  8. Problem with many tags C GCAA CCCGTAG Insertion Deletion Source:

    Adey et al. 2010 Genome Biology Faircloth and Glenn 2011 Unpublished Substitutions ✓
  9. Edit tag distance TATGCG CGAGTT - 5 Required Edit Distance

    = 2 * (Errors) + 1 Correctable Errors = (Edit Distance - 1) / 2
  10. Numbers of tags ! " # $ % & '

    " ! " " " " " " # #$ ! " " " " " $ %& &$ $ " " " " % #&& '& && ' " " " & $(& &)( #' * ( " " ' &)(% ()& %# &* % ( " () !&+* +!& &%' ') &' $ ( *+,-./01 21/3-4 567189+:1, Source: Faircloth and Glenn 2011 Unpublished
  11. The problem value of MPS You The Machine = 7

    days of 8 plates per day containing 5 mplex loci Illumina HiSeq 2000 100 PE Run 100X Depth - Each Amplicon
  12. Phylogen(omics|etics) • Want many loci • Meet sequencing output •

    Give representative sample of genome • Want alignable loci • Want “universal” loci
  13. PCR vs. Capture 1 10 100 1000 10000 100000 1000000

    PCR Biotinylated Oligos MySelect 25k SureSelect 55k MySelect 200k SureSelect Exome 450k Log(targets)
  14. UCE Discovery 16. L. Shen, K. L. Rock, Proc. Natl.

    Acad. Sci. U.S.A. 101, 3035 (2004). 17. S. P. Schoenberger et al., J. Immunol. 161, 3808 (1998). 18. M. L. Albert, B. Sauter, N. Bhardwaj, Nature 392, 86 (1998). 19. M. Bellone et al., J. Immunol. 159, 5391 (1997). 20. J. W. Yewdell, C. C. Norbury, J. R. Bennink, Adv. Immunol. 73, 1 (1999). 21. A. Serna, M. C. Ramirez, A. Soukhanova, L. J. Sigal, J. Immunol. 171, 5668 (2003). 22. N. P. Restifo et al., J. Immunol. 154, 4414 (1995). 23. We thank B. Buschling, D. Tokarchick, and A. Schell for technical assistance. We are grateful to M. Epler and S. Tevethia for their generous gift of Db- NP 366-374 tetramers. This work was supported in part by a Wellcome Prize Traveling Fellowship and U.S. Public Health Service grants, and NIH grant AI- 056094-01 to C.C.N. Supporting Online Material www.sciencemag.org/cgi/conten DC1 Materials and Methods Figs. S1 and S2 References and Notes 3 February 2004; accepted 14 A Ultraconserved Elements in the Human Genome Gill Bejerano,1* Michael Pheasant,3 Igor Makunin,3 Stuart Stephen,3 W. James Kent,1 John S. Mattick,3 David Haussler2* There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates. Although only about 1.2% of the human genome appears to code for proteins (1–3), it has been estimated that as much as 5% is more conserved than would be expected served with orthologous segments in rodents: those showing 100% identity and with no insertions or deletions in their alignment with the mouse and rat. Exclusive of ribosomal with the dog genome, w using reads from the N Biotechnology Informatio chive (477/481 ϭ 99.2% aligning at an average ty).Thus, it appears that ultraconserved elements der extreme negative sele cies for more than 300 some of them for at least As expected, the ultra exhibit almost no natur human population. Only bases examined in the ments (excluding the first each element) are at nucleotide polymorphism SNP database (dbSNP) ( much DNA, we would validated sites, so validat represented by 20-fold (P unvalidated SNPs we fo likely errors in the unv dbSNP (table S2b). Th bases exhibit very few d chimp genome as well, single base changes where Bejerano et al. 2003, Science
  15. Many vertebrate alignments 28-Way vertebrate alignment and conservation track in

    the UCSC Genome Browser Webb Miller,1,11 Kate Rosenbloom,2 Ross C. Hardison,1 Minmei Hou,1 James Taylor,3 Brian Raney,2 Richard Burhans,1 David C. King,1 Robert Baertsch,2 Daniel Blankenberg,1 Sergei L. Kosakovsky Pond,4 Anton Nekrutenko,1 Belinda Giardine,1 Robert S. Harris,1 Svitlana Tyekucheva,1 Mark Diekhans,2 Thomas H. Pringle,5 William J. Murphy,6 Arthur Lesk,1 George M. Weinstock,7 Kerstin Lindblad-Toh,8 Richard A. Gibbs,7 Eric S. Lander,8 Adam Siepel,9 David Haussler,2,10 and W. James Kent2 1Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA; 2Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA; 3Courant Institute, New York University, New York, New York 10012, USA; 4Antiviral Research Center, University of California at San Diego, San Diego, California 92103, USA; 5Sperling Foundation, Eugene, Oregon 97405, USA; 6Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA; 7Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 8Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; 9Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA; 10Howard Hughes Medical Institute, Santa Cruz, California 95060, USA This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/ hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in Resource Cold Spring Harbor Laboratory Press on October 19, 2011 - Published by genome.cshlp.org Downloaded from Miller et al. 2007, Genome Research
  16. Assemble reads → contigs Loc1 Loc2 Loc3 Loc4 Loc5 Loc6

    Loc7 “Throw out” reads not matching expected locus
  17. Not limited to UCEs • Target SNPs • Target genes

    • Target exons • Target exome • Target gene regulators • etc.
  18. The problem value of MPS You The Machine Illumina HiSeq

    2000 100 PE Run 100X Depth - Each Amplicon
  19. Design probes { Can be “easy” if company does it

    or Can require skill to slice and design
  20. Prepare large analyses Loc1 Loc2 Loc3 Loc4 Loc5 Loc6 Loc7

    Loc1 Loc2 Loc3 Loc4 Custom software to build NEXUS Custom software to generate species, gene, and bootstrap trees
  21. More goodies... Faircloth et al. Syst Biol doi: 10.1093/sysbio/sys004 pmid:

    22232343 McCormack et al. Genome Res doi: 10.1101/gr.125864.111 pmid: 22207614