PAG XXIV: From Sequencing to Chromosomes: New de novo Assembly and Scaffolding Methods Improve the Goat Reference Genome

From sequencing to chromosomes: new de novo assembly and scaffolding
methods improve the goat reference genome Sergey Koren, @sergekoren Genome Informatics Section, NHGRI

Hybrid error correction and de novo assembly of single-molecule sequencing
reads Koren et al. (2012) Nature Biotechnology Reducing assembly complexity of microbial genomes with single-molecule sequencing Koren et al. (2013) Genome Biology Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing Berlin et al. (2015) Nature Biotechnology With Canu: 25x of PacBio P6C4 achieves: > 90% of bacteria assemble without gaps > QV40 (99.99%) consensus accuracy < 15 minutes of compute < $1,000 total cost Long read assembly

1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 X CHM1 Canu Human genome assembly solved?

Contigs ≠ Genome ≠

Goat PacBio vs RefV2 PacBio RefV2 Ctgs # bp 2.63Gbp
2.72Gbp # ctg 3,096 173,141 Max 35,623,478 679,126 N50 4,473,169 73,533 Scfs # bp 2.63Gbp 2.69Gbp # scf 3,096 30 Max 35,623,478 161,917,960 N50 4,473,169 103,731,018

! CHIR 1.0 annotation LiftOver split mappings !  Example split gene
! Comparative mapping with sheep O. aries exons Exclusive PacBio Exclusive BGI Shared Total Mappings 9,534 564 225,365 Unmapped 564 9,534 9,140 Split Exons 592 1,930 1,528 CHIR 1.0 Sheep Antelope Cow Annotation comparison

Algorithms Convert Images into Molecules Assembly Algorithms Align Molecules de
novo for Constructing Consensus Genome Maps Cross-Mapping Across Multiple Samples or to a Reference High Throughput, High Resolution Imaging Gives Contiguous Molecules up to Mb Length •  Automated SV Detection •  Gap Sizing •  Genome Finishing Isolate High Molecular Weight DNA Label Specific Sequences Across the Entire Genome Transfer Labeled DNA into Cartridge for Scanning Load, Linearize & Image Labeled DNA in Repeated Cycling to Scan Whole Genome Insertion Customer Sample Irys Workflow for Genome Mapping Irys® Workflow- Overview © 2015 BioNano Genomics

Goat PacBio+BioNano vs RefV2 PacBio PacBio + BioNano RefV2 Ctgs
# bp 2.63Gbp 2.62Gbp 2.72Gbp # ctg 3,096 2,349 173,141 Max 35,623,478 57,511,119 679,126 N50 4,473,169 12,102,227 73,533 Scfs # bp 2.63Gbp 2.62Gbp 2.69Gbp # scf 3,096 2,084 30 Max 35,623,478 66,727,870 161,917,960 N50 4,473,169 14,265,070 103,731,018

chr1 chr8 chrX chr28 PacBio+BioNano to reference

3D model of genome Hi-C can be used for 3D
modeling and scaffolding of genome assemblies Duan, et al. Nature, 2010 Genome scaffolding Burton, et al. Nature Biotech, 2013 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 chromosome chromosome From I. Liachko Haplotype phasing Selvaraj, et al. Nature Biotech, 2013 Crosslink) Fragment) Proximity) Liga4on) Sequence) Junc4ons)

Goat PacBio+BioNano vs RefV2 PacBio PacBio + BioNano PacBio +
BioNano + HiC RefV2 Ctgs # bp 2.63Gbp 2.62Gbp 2.63Gb 2.72Gbp # ctg 3,096 2,349 1,522 173,141 Max 35,623,478 57,511,119 66,489,255 679,126 N50 4,473,169 12,102,227 23,340,314 73,533 Scfs # bp 2.63Gbp 2.62Gbp 2.63Gbp 2.69Gbp # scf 3,096 2,084 525 30 Max 35,623,478 66,727,870 157,517,791 161,917,960 N50 4,473,169 14,265,070 91,787,174 103,731,018

chr1 chr8 chrX chr28 PacBio+BioNano+HiC to reference

chr1 chr8 chrX chr28 PacBio+BioNano+HiC+curation

Acknowledgements !  Canu !  Adam Phillippy !  Brian Walenz ! 
Goat Project !  Derek M. Bickhart !  Adam M Phillippy !  Timothy P.L. Smith !  Shawn T. Sullivan !  Ivan Liachko !  Joshua N. Burton !  Maitreya J. Dunham !  Jay Shendure !  Alex R. Hastie !  Brian L. Sayre !  Heather J Huson !  George E. Liu !  Benjamin D. Rosen !  Steven G. Schroeder !  Curtis P. VanTassell !  Tad S. Sonstegard !  NHGRI ! Postdocs wanted! !  Genome Informatics Section !  Assembly !  Structural variation !  Infectious disease !  Undiagnosed disease !  http://www.genome.gov/27563366 /MarBL

PUBLIC DOMAIN NOTICE This presentation is "United States Government Work"
under the terms of the United States Copyright Act. It was written as part of the authors' official duties for the United States Government and thus cannot be copyrighted. This presentation is freely available to the public for use without a copyright notice. Restrictions cannot be placed on its present or future use. Although all reasonable efforts have been taken to ensure the accuracy and reliability of the presentation and associated data, the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH) and the U.S. Government do not and cannot warrant the performance or results that may be obtained based on this presentation or data. NHGRI, NIH and the U.S. Government disclaim all warranties as to performance, merchantability or fitness for any particular purpose. Please cite the authors in any work or product based on this material.

PAG XXIV: From Sequencing to Chromosomes: New d...

PAG XXIV: From Sequencing to Chromosomes: New de novo Assembly and Scaffolding Methods Improve the Goat Reference Genome

Sergey Koren

More Decks by Sergey Koren

Other Decks in Science

Featured

Transcript

From sequencing to chromosomes: new de novo assembly and scaffolding

Hybrid error correction and de novo assembly of single-molecule sequencing

1 2 3 4 5 6 7 8 9 10

Contigs ≠ Genome ≠

Goat PacBio vs RefV2 PacBio RefV2 Ctgs # bp 2.63Gbp

! CHIR 1.0 annotation LiftOver split mappings !  Example split gene

Algorithms Convert Images into Molecules Assembly Algorithms Align Molecules de

Goat PacBio+BioNano vs RefV2 PacBio PacBio + BioNano RefV2 Ctgs

chr1 chr8 chrX chr28 PacBio+BioNano to reference

3D model of genome Hi-C can be used for 3D

Goat PacBio+BioNano vs RefV2 PacBio PacBio + BioNano PacBio +

chr1 chr8 chrX chr28 PacBio+BioNano+HiC to reference

chr1 chr8 chrX chr28 PacBio+BioNano+HiC+curation

Acknowledgements !  Canu !  Adam Phillippy !  Brian Walenz !

PUBLIC DOMAIN NOTICE This presentation is "United States Government Work"