Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using PacBio Circular Consensus Sequencing (CCS) for Highly Accurate Assemblies

GenomeArk
January 16, 2019

Using PacBio Circular Consensus Sequencing (CCS) for Highly Accurate Assemblies

Gene Myers
Chair of Systems Biology, MPI for Cell Biology and Genetics
Dresden, DE

GenomeArk

January 16, 2019
Tweet

More Decks by GenomeArk

Other Decks in Research

Transcript

  1. Using PacBio Circular Consensus Sequencing (CCS) for Highly Accurate Assemblies

    Gene Myers Chair of Systems Biology MPI for Cell Biology and Genetics Dresden, DE MPI CBG CSBD
  2. PB only or PB + 1 would be a significant

    savings 2019/2020 1.2K EU 3.5K EU 50X Illumina in 10X read clouds Bionano restriction maps 50X Illumina in Hi-C read pairs Scaffolding Technologies 60X Pacbio long reads 10K EU 5K EU 2017 Assume 1Gbp Genome PB + Bionano: 6 Bats Project: 10Mbp Contig N50 100Mbp Scaffold N50 HQ Genomics But favor PB + Hi-C
  3. At least 2 ways to improve: HQ Genomes Tomorrow: ✦

    Scrubbing to remove artifacts ✦ Repeat/Haplotype separation based on heterogeneity ✦ Repeat detection and modeling HIFI CCS protocol: ~ 3x loss in throughput and cost over raw But each insert wrapped ~ 8x 㱺 ~ 0.2% error rate Which is better? - 15Kbp reads at 99.8% - 50Kbp reads at 90% - some combination of both? • Longer or more accurate reads (CCS) • Better Algorithms
  4. String Graph: The “Reality” 10% error 30% alignment threshold 㱺

    10%-repeats entangle .5% error 2% alignment threshold 㱺 only <1%-repeats entangle
  5. … … small large Solving Repeats spanning reads suffice …

    … microhet’s could get you through (should be easy(er)) All the power of long reads has thus far been due to this ⟹ longer is better This has not been done Requires ability to id. microhet’s ⟹ more accurate is better
  6. Reads: Chimers Adaptamers Low Q dropouts 90% average Reads: No

    Chimers No Adaptamers No Low Q dropouts 99.999% uniformly Haplotype Phased Perfect PB reads with Scrubbing Solves: Artifacts Haplotypes Low Copy Repeats (≤ 5) Scrubber Long Read Assembler Task is easier, but still necessary: 99.8% average .5% of reads are chimers .02% of reads have no adaptamer 15% of reads have a low Q “panel” 100bp with 5% or more error
  7. Daligner: Switch from 14-mers to 40-mers Take 1 out of

    ever 10 mers (at random) Compute Time Reductions • 99.999% Sensitivity (Alignment between ≧1000bp with .5% error in each read) (R. Durbin) (and uses ≦ 8Gbp memory) • Can use 1Gbp blocks vs. 1/4Gbp (vs 2000+ for raw reads) • 90 CPU hrs for 30X HG002 (on this laptop)
  8. Accelerates assembly compute time HiFi reads likely to improve diploid

    assembly Likely to be quite effective at haplotype phasing / repeat separation Better CCS algorithms are needed
  9. Acknowledgements PacBio Paul Paluso James Drake Kevin Corcoran Jonas Korlach

    Mike Hunkapillar Dresden-Concept Genome Center CRTD TU-D Andreas Dahl MPI-CBG Sylke Winkler Martin Pippel German Tischler