Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HiFi Reads for Comprehensive Variant Detection

HiFi Reads for Comprehensive Variant Detection

This presentation was created for the 2020 SMRTLeiden Virtual conference.

William Rowell

May 12, 2020
Tweet

More Decks by William Rowell

Other Decks in Science

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved.
    HiFi Reads for Comprehensive Variant
    Detection
    William Rowell, Staff Scientist, Bioinformatics Applications, PacBio
    @nothingclever
    #SMRTLeiden

    View full-size slide

  2. NEW PARADIGM OF ACCURATE, LONG READ DNA SEQUENCING
    Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology.
    Article Metrics:
    Altmetric score*
    * Article is in the 98th percentile of the
    254,341 tracked articles of a similar age
    in all journals.
    Published: 12 August 2019

    View full-size slide

  3. TYPES OF GENOMIC VARIATION
    SMRT Sequencing provides comprehensive detection of all variant types.

    View full-size slide

  4. VARIATION IN A HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants
    1-49 bp
    indels
    PacBio
    HiFi reads
    Short reads
    vs
    GRCh38

    View full-size slide

  5. VARIATION IN A HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants
    1-49 bp
    indels
    PacBio
    HiFi reads
    Short reads
    vs
    GRCh38
    Short reads miss ~80%
    of SVs, typically long
    insertion events or
    variants in difficult-to-
    map repetitive regions.
    This is not improved by
    increasing the
    coverage.

    View full-size slide

  6. PACBIO LONG READS SPAN STRUCTURAL VARIANTS
    1,733
    1,733 bp deletion
    deletion not
    detected
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    Haplotype 1
    Haplotype 2
    PacBio
    HiFi reads
    Short
    reads
    Repeats

    View full-size slide

  7. VARIATION IN A HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants
    1-49 bp
    indels
    Short reads
    PacBio high accuracy long reads improve mappability
    and increase variant detection in these regions
    Small variants missed in
    difficult-to-map regions of
    the human genome
    vs
    GRCh38
    PacBio
    HiFi reads

    View full-size slide

  8. HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME
    This impacts many medically-relevant genes

    View full-size slide

  9. HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME
    Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology.
    List originally from Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation
    sequencing. Genet Med.

    View full-size slide

  10. -SVs
    -“Structural Variant Calling”
    application in SMRT Link
    -or map with pbmm2 and call with
    pbsv from command line
    -SNVs and small indels
    -map with pbmm2
    -Google DeepVariant
    -Optional phasing with WhatsHap
    RECOMMENDED VARIANT DETECTION WORKFLOWS

    View full-size slide

  11. PACBIO STRUCTURAL VARIANT CALLING (PBSV)
    -Identifies signatures of structural variation
    -Calls variants and assigns genotypes
    -Recent updates:
    -improved sensitivity for large insertions and deletions
    -call duplications and copy number variation
    -simplified parameters with --hifi preset
    -report variants seen in a single read with at least 10% read
    support.
    -equivalent to “-A 1 -O 1 -S 0 -P 10”

    View full-size slide

  12. -Variant calling pipeline powered by
    deep neural network
    -Fast and inexpensive
    -Run from binaries as well as Docker
    or Singularity images
    -PacBio model trained on HiFi reads
    from Sequel and Sequel II Systems
    with median read quality >99.9%
    -Model is updated regularly to
    support PacBio Chemistry and
    Software updates
    GOOGLE DEEPVARIANT
    Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).

    View full-size slide

  13. UPDATES TO DEEPVARIANT PACBIO MODEL

    View full-size slide

  14. UPDATES TO DEEPVARIANT PACBIO MODEL

    View full-size slide

  15. UPDATES TO DEEPVARIANT PACBIO MODEL

    View full-size slide

  16. singularity exec --bind $PWD \
    docker://google/deepvariant:0.10.0 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type PACBIO \
    --ref ./reference.fasta \
    --reads ./aligned.ccs.bam \
    --output_vcf ./output.vcf.gz \
    --num_shards $(nproc)
    RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY
    Example suitable for amplicon analysis.

    View full-size slide

  17. NIST GENOME IN A BOTTLE (GIAB) BENCHMARK
    Consortium dedicated to authoritative characterization of benchmark
    human genomes
    https://www.nist.gov/programs-projects/genome-bottle
    HG002
    HG003 HG004
    doi:10.1101/664623
    Benchmark (or "High-confidence") variant calls and regions
    • Structural variants: Currently available for HG002 on GRCh37
    • Small variants in more difficult regions: Currently available for HG002 on GRCh37 and GRCh38

    View full-size slide

  18. GENOME IN A BOTTLE BENCHMARK AND COVERAGE
    Wenger, Peluso, et al. (2019) https://www.nature.com/articles/s41587-019-0217-9
    HiFi fold coverage
    HiFi fold coverage
    HiFi fold coverage
    HiFi fold coverage
    15-fold HiFi coverage
    HG002
    HG003 HG004
    Article | Published: 12 August 2019

    View full-size slide

  19. VARIANT DETECTION BENCHMARKING (HG002)
    Recall | Precision (%)
    HiFi Coverage SNVs Indels SVs
    15-fold 99.44 | 99.69 95.41 | 96.57 97.41 | 94.48
    30-fold 99.97 | 99.87 98.78 | 98.90 98.00 | 95.29
    SNV and indel calls are from DeepVariant 0.10.0 and evaluated against the GIAB v3.3.2 small variant benchmark using Hap.py.
    SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.

    View full-size slide

  20. HIFI DATA ADDS NEW VARIATION TO GIAB BENCHMARKS
    -HiFi datasets for 7 GIAB samples are being used to
    improve SV and small variant benchmarks.
    -Upcoming small variant benchmark release v4.1 for
    HG002 will add:
    -~6% reference bases
    -~300,000 SNVs
    -~50,000 indels
    -Benchmark updates for other samples will follow.
    -HiFi datasets are included in the precisionFDA Truth V2
    Challenge, which focuses on difficult-to-map regions.
    HG002
    HG003 HG004

    View full-size slide

  21. COMPREHENSIVE VARIANT DETECTION WITH HIFI READS
    -HiFi = mappability of long reads + base quality of short reads
    -Structural variants: SMRT Link or pbmm2 + pbsv
    -Added support for duplications and copy number variations
    -Small variants: DeepVariant
    -Added support for amplified fragments
    -Recommend 15-fold coverage for most discovery applications.
    Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA:
    HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)

    View full-size slide

  22. Small variant detection
    Andrew Carroll
    Pi-Chuan Chang
    Richard Hall
    Alexey Kolesnikov
    Maria Nattestad
    Aaron Wenger
    Justin Zook, Justin Wagner, and the
    Genome in a Bottle Consortium
    ACKNOWLEDGMENTS
    Structural variant detection
    Armin Töpfer
    Aaron Wenger
    Justin Zook, Nate Olson, and the
    Genome in a Bottle Consortium

    View full-size slide

  23. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the
    Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the
    overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. BluePippin and SageELF are
    trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc.
    All other trademarks are the sole property of their respective owners.
    www.pacb.com

    View full-size slide