original fragments ! Fragment arrival rate is consistent with a Poisson process ! Read-pair distance and orientation is preserved, etc… ! Many possible paths ! Use a variety of heuristics to find good path ! Output only the high-confidence paths through the graph (contigs) ! Hard to assemble, easier to validate ! Run a bunch and pick the best! Mark Chaisson Short read assembly is hard Automated ensemble assembly and validation of microbial genomes. Koren et al. (2014) BMC Bioinformatics
randomly generated sequence is trivial to assemble ! Repeats shorter than the read length don’t matter ! “It” ! “It was” ! “It was the best” ! “It was the best of times” ! “With his hands in his pockets” Why is assembly hard? >1,000 SSR 320 TE 2 SegDup 1 Unique 3 Meta
reads Koren et al. (2012) Nature Biotechnology Reducing assembly complexity of microbial genomes with single-molecule sequencing Koren et al. (2013) Genome Biology Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing Berlin et al. (2015) Nature Biotechnology With Canu: 25x of PacBio P6C4 achieves: > 90% of bacteria assemble without gaps > QV40 (99.99%) consensus accuracy < 15 minutes of compute < $1,000 total cost Long read assembly
Drosophila melanogaster Carvalho et al. (2015) Finally, we emphasize the utility of PacBio technology in dealing with difficult genomic regions: as was the case with the Mst77Y region, [MHAP] produced a seemingly error-free assembly of the FDY region, something that has eluded us for years of hard work. Here we describe flagrante delicto Y (FDY), a very young gene that shows how Y-linked genes were acquired. FDY originated 2 million years ago from a duplication of a contiguous autosomal segment of 11 kb containing five genes that inserted into the Y chromosome. Four of these autosome-to-Y gene copies became inactivated (“pseudogenes”), lost part of their sequences, and most likely will disappear in the next few million years. FDY, originally a female- biased gene, acquired testis expression and remained functional.
fails due to low interactions ! Bionano fails due to missing restriction sites ! Improved algorithms to combine technologies ! Assembler output ! Assembling populations ! Graph based assembly and formats ! Long read polishing and phasing
MHAP ! Konstatin Berlin ! Goat ! Derek M. Bickhart ! Adam M Phillippy ! Timothy P.L. Smith ! Shawn T. Sullivan ! Ivan Liachko ! Joshua N. Burton ! Maitreya J. Dunham ! Jay Shendure ! Alex R. Hastie ! Brian L. Sayre ! Heather J Huson ! George E. Liu ! Benjamin D. Rosen ! Steven G. Schroeder ! Curtis P. VanTassell ! Tad S. Sonstegard ! Dermatofibrosarcoma ! Alejandro Gutierrez ! Sarah Morton ! Mike Schatz ! Maria Nattestad ! Fritz Sedlazeck ! A. gambiae ! Andrew Hall ! Philippos-Aris Papathanos ! Atashi Sharma ! Changde Cheng ! Omar Akbari ! Lauren Assour ! Nicholas Bergman ! Alessia Cagnetti ! Andrea Crisanti ! Tania Dottorini ! Elisa Fiorentini ! Roberto Galizi ! Jonathan Hnath ! Xiaofang Jiang ! Tony Nolan ! Diana Radune ! Maria Sharakhova ! Aaron Steele ! Vladimir A. Timoshevskiy ! Nikolai Windbichler ! Simo Zhangl ! Matthew W. Hahn ! Scott J. Emrich ! Igor V. Sharakhov ! Zhijian Tu ! Nora J. Besansky ! D. melanogaster ! Casey Bergman ! Sue Celniker ! Jason Chin ! Jane Landolin ! NHGRI ! Postdocs wanted! ! Genome Informatics Section ! Assembly ! Structural variation ! Infectious disease ! Undiagnosed disease ! http://www.genome.gov/27563366 /MarBL
under the terms of the United States Copyright Act. It was written as part of the authors' official duties for the United States Government and thus cannot be copyrighted. This presentation is freely available to the public for use without a copyright notice. Restrictions cannot be placed on its present or future use. Although all reasonable efforts have been taken to ensure the accuracy and reliability of the presentation and associated data, the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH) and the U.S. Government do not and cannot warrant the performance or results that may be obtained based on this presentation or data. NHGRI, NIH and the U.S. Government disclaim all warranties as to performance, merchantability or fitness for any particular purpose. Please cite the authors in any work or product based on this material.