HiFi Reads for Comprehensive Variant Detection

HiFi Reads for Comprehensive Variant Detection

This presentation was created for the 2020 SMRTLeiden Virtual conference.

860c43c4f8fb36f71342e9257cd05671?s=128

William Rowell

May 12, 2020
Tweet

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. HiFi Reads for Comprehensive Variant Detection William Rowell, Staff Scientist, Bioinformatics Applications, PacBio @nothingclever #SMRTLeiden
  2. NEW PARADIGM OF ACCURATE, LONG READ DNA SEQUENCING Wenger, A.

    M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. Article Metrics: Altmetric score* * Article is in the 98th percentile of the 254,341 tracked articles of a similar age in all journals. Published: 12 August 2019
  3. TYPES OF GENOMIC VARIATION SMRT Sequencing provides comprehensive detection of

    all variant types.
  4. VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10

    Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels PacBio HiFi reads Short reads vs GRCh38
  5. VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10

    Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels PacBio HiFi reads Short reads vs GRCh38 Short reads miss ~80% of SVs, typically long insertion events or variants in difficult-to- map repetitive regions. This is not improved by increasing the coverage.
  6. PACBIO LONG READS SPAN STRUCTURAL VARIANTS 1,733 1,733 bp deletion

    deletion not detected 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 Haplotype 1 Haplotype 2 PacBio HiFi reads Short reads Repeats
  7. VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10

    Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels Short reads PacBio high accuracy long reads improve mappability and increase variant detection in these regions Small variants missed in difficult-to-map regions of the human genome vs GRCh38 PacBio HiFi reads
  8. HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME This impacts many

    medically-relevant genes
  9. HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME Wenger, A. M.,

    et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. List originally from Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med.
  10. -SVs -“Structural Variant Calling” application in SMRT Link -or map

    with pbmm2 and call with pbsv from command line -SNVs and small indels -map with pbmm2 -Google DeepVariant -Optional phasing with WhatsHap RECOMMENDED VARIANT DETECTION WORKFLOWS
  11. PACBIO STRUCTURAL VARIANT CALLING (PBSV) -Identifies signatures of structural variation

    -Calls variants and assigns genotypes -Recent updates: -improved sensitivity for large insertions and deletions -call duplications and copy number variation -simplified parameters with --hifi preset -report variants seen in a single read with at least 10% read support. -equivalent to “-A 1 -O 1 -S 0 -P 10”
  12. -Variant calling pipeline powered by deep neural network -Fast and

    inexpensive -Run from binaries as well as Docker or Singularity images -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median read quality >99.9% -Model is updated regularly to support PacBio Chemistry and Software updates GOOGLE DEEPVARIANT Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).
  13. UPDATES TO DEEPVARIANT PACBIO MODEL

  14. UPDATES TO DEEPVARIANT PACBIO MODEL

  15. UPDATES TO DEEPVARIANT PACBIO MODEL

  16. singularity exec --bind $PWD \ docker://google/deepvariant:0.10.0 \ /opt/deepvariant/bin/run_deepvariant \ --model_type

    PACBIO \ --ref ./reference.fasta \ --reads ./aligned.ccs.bam \ --output_vcf ./output.vcf.gz \ --num_shards $(nproc) RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY Example suitable for amplicon analysis.
  17. NIST GENOME IN A BOTTLE (GIAB) BENCHMARK Consortium dedicated to

    authoritative characterization of benchmark human genomes https://www.nist.gov/programs-projects/genome-bottle HG002 HG003 HG004 doi:10.1101/664623 Benchmark (or "High-confidence") variant calls and regions • Structural variants: Currently available for HG002 on GRCh37 • Small variants in more difficult regions: Currently available for HG002 on GRCh37 and GRCh38
  18. GENOME IN A BOTTLE BENCHMARK AND COVERAGE Wenger, Peluso, et

    al. (2019) https://www.nature.com/articles/s41587-019-0217-9 HiFi fold coverage HiFi fold coverage HiFi fold coverage HiFi fold coverage 15-fold HiFi coverage HG002 HG003 HG004 Article | Published: 12 August 2019
  19. VARIANT DETECTION BENCHMARKING (HG002) Recall | Precision (%) HiFi Coverage

    SNVs Indels SVs 15-fold 99.44 | 99.69 95.41 | 96.57 97.41 | 94.48 30-fold 99.97 | 99.87 98.78 | 98.90 98.00 | 95.29 SNV and indel calls are from DeepVariant 0.10.0 and evaluated against the GIAB v3.3.2 small variant benchmark using Hap.py. SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.
  20. HIFI DATA ADDS NEW VARIATION TO GIAB BENCHMARKS -HiFi datasets

    for 7 GIAB samples are being used to improve SV and small variant benchmarks. -Upcoming small variant benchmark release v4.1 for HG002 will add: -~6% reference bases -~300,000 SNVs -~50,000 indels -Benchmark updates for other samples will follow. -HiFi datasets are included in the precisionFDA Truth V2 Challenge, which focuses on difficult-to-map regions. HG002 HG003 HG004
  21. COMPREHENSIVE VARIANT DETECTION WITH HIFI READS -HiFi = mappability of

    long reads + base quality of short reads -Structural variants: SMRT Link or pbmm2 + pbsv -Added support for duplications and copy number variations -Small variants: DeepVariant -Added support for amplified fragments -Recommend 15-fold coverage for most discovery applications. Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA: HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)
  22. Small variant detection Andrew Carroll Pi-Chuan Chang Richard Hall Alexey

    Kolesnikov Maria Nattestad Aaron Wenger Justin Zook, Justin Wagner, and the Genome in a Bottle Consortium ACKNOWLEDGMENTS Structural variant detection Armin Töpfer Aaron Wenger Justin Zook, Nate Olson, and the Genome in a Bottle Consortium
  23. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com