Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comprehensive Variant Detection with PacBio HiFi Reads

William Rowell
September 17, 2020

Comprehensive Variant Detection with PacBio HiFi Reads

This presentation was for a UC system webinar.

William Rowell

September 17, 2020
Tweet

More Decks by William Rowell

Other Decks in Science

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved.
    Comprehensive Variant Detection
    with PacBio HiFi Reads
    William Rowell, Staff Scientist, Bioinformatics Applications, PacBio @nothingclever

    View full-size slide

  2. HIFI READS ARE LONG (15-20 KB) AND ACCURATE (99%)

    View full-size slide

  3. TYPES OF GENOMIC VARIATION
    SMRT Sequencing provides comprehensive detection of all variant types.

    View full-size slide

  4. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF
    VARIATION IN THE HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants (SVs)
    1-49 bp
    indels
    HiFi reads
    Short reads
    vs
    GRCh38

    View full-size slide

  5. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF
    VARIATION IN THE HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants (SVs)
    1-49 bp
    indels
    HiFi reads
    Short reads
    vs
    GRCh38
    SNVs and indels
    in difficult regions

    View full-size slide

  6. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF
    VARIATION IN THE HUMAN GENOME
    Short reads
    HiFi reads
    STRC
    STRC is a congenital deafness gene that requires long reads to cover all exons.

    View full-size slide

  7. HIFI READS IMPROVE MAPPABILITY IN MANY
    MEDICALLY-RELEVANT GENES
    % problem
    exons resolved Genes
    100% ABCC6, ABCD1, ACAN, ACSM2B, AKR1C2, ALG1, ANKRD11, BCR, CATSPER2,
    CD177, CEL, CES1, CFH, CFHR1, CFHR3, CFHR4, CGB, CHEK2, CISD2, CLCNKA,
    CLCNKB, CORO1A, COX10, CRYBB2, CSH1, CYP11B1, CYP11B2, CYP21A2,
    CYP2A6, CYP2D6, CYP2F1, CYP4A22, DDX11, DHRS4L1, DIS3L2, DND1, DPY19L2,
    DUOX2, ESRRA, F8, FAM120A, FAM205A, FANCD2, FCGR1A, FCGR2A, FCGR3A,
    FCGR3B, FLG, FLNC, FOXD4, FOXO3, FUT3, GBA, GFRA2, GON4L, GRM5, GSTM1,
    GYPA, GYPB, GYPE, HBA1, HBA2, HBG1, HBG2, HP, HS6ST1, IDS, IFT122, IKBKG,
    IL9R, KIR2DL1, KIR2DL3, KMT2C, KRT17, KRT6A, KRT6B, KRT6C, KRT81, KRT86,
    LEFTY2, LPA, MST1, MUC5B, MYH6, MYH7, NEB, NLGN4X, NLGN4Y, NOS2,
    NOTCH2, NXF5, OPN1LW, OR2T5, OR51A2, PCDH11X, PCDHB4, PGAM1, PHC1,
    PIK3CA, PKD1, PLA2G10, PLEKHM1, PLG, PMS2, PRB1, PRDM9, PROS1, RAB40AL,
    RALGAPA1, RANBP2, RHCE, RHD, RHPN2, ROCK1, SAA1, SDHA, SDHC, SFTPA1,
    SFTPA2, SIGLEC14, SLC6A8, SMG1, SPATA31C1, SPTLC1, SRGAP2, SSX7, STAT5B,
    STK19, STRC, SULT1A1, SUZ12, TBX20, TCEB3C, TLR1, TLR6, TMEM231, TNXB,
    TRIOBP, TRPA1, TTN, TUBA1A, TUBB2B, UGT1A5, UGT2B15, UGT2B17, UNC93B1,
    VCY, VWF, WDR72, ZNF419, ZNF592, ZNF674
    [75%, 100%) ANAPC1, C4A, C4B, CHRNA7, CR1, DUX4, FCGR2B, HYDIN, OTOA, PDPK1, TMLHE
    [50%, 75%) ADAMTSL2, CDY2A, DAZ1, GTF2I, NAIP, OCLN, RPS17
    [25%, 50%) DAZ2, DAZ3, KIR3DL1, OPN1MW, PPIP5K1
    (0%, 25%) NCF1, RBMY1A1
    0% BPY2, CCL3L1, CCL4L1, CDY1, CFC1, CFC1B, GTF2IRD2, HSFY1, MRC1, OR4F5,
    PRY, PRY2, SMN1, SMN2, TSPY1, XKRY 16
    2
    5
    7
    11
    152
    Genes

    View full-size slide

  8. GOOGLE DEEPVARIANT IS A HIGHLY ACCURATE SMALL
    VARIANT CALLER FOR HIFI READS
    Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).
    -Variant calling pipeline powered by deep neural network
    -Fast and inexpensive
    -Run from binaries as well as Docker or Singularity containers
    -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median
    read quality >99.9%
    -Model is updated regularly to support PacBio Chemistry and Software updates

    View full-size slide

  9. RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY
    Example suitable for amplicon analysis.
    singularity exec \
    docker://google/deepvariant:1.0.0 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type PACBIO \
    --ref ./reference.fasta \
    --reads ./aligned.ccs.bam \
    --output_vcf ./output.vcf.gz \
    --num_shards $(nproc)

    View full-size slide

  10. PRECISION & RECALL
    Variant calls
    Benchmark (“truth”) variants
    Precision percentage of calls that are correct = TP/(TP+FP)
    Recall percentage of truth that is called = TP/(TP+FN)
    Metric Abbreviation Benchmark Variant calls
    True Positive TP ✓ ✓
    False Positive FP - ✓
    False Negative FN ✓ -
    Benchmark Variant Calls
    TP
    FN FP

    View full-size slide

  11. PRECISIONFDA TRUTH CHALLENGE V2
    https://precision.fda.gov/challenges/10/view/results
    HG002 HG003 HG004
    35× Illumina NovaSeq ✓ ✓ ✓
    35× HiFi, PacBio Sequel II System ✓ ✓ ✓
    60× ONT PromethION ✓ ✓ ✓
    V4 Benchmark ✓ Blinded Blinded

    View full-size slide

  12. https://www.pacb.com/blog/precisionfda-challenge/
    24 Illumina
    20 Multi
    17 HiFi
    3 ONT
    64 entries

    View full-size slide

  13. 99.9
    90
    99
    97
    99.7
    Accuracy,
    F1(%)
    precisionFDA Entries
    Top 12 entries and 25 of top 26 use PacBio HiFi reads
    Illumina
    Multi
    HiFi
    ONT
    HiFi DeepVariant
    Illumina DeepVariant
    Illumina GATK
    ONT DeepVariant

    View full-size slide

  14. TYPES OF GENOMIC VARIATION
    SMRT Sequencing provides comprehensive detection of all variant types.

    View full-size slide

  15. TYPES OF GENOMIC VARIATION
    SMRT Sequencing provides comprehensive detection of all variant types.

    View full-size slide

  16. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF
    VARIATION IN THE HUMAN GENOME
    5 Mb 3 Mb 10 Mb
    1 bp
    SNVs
    ≥50 bp
    structural variants (SVs)
    1-49 bp
    indels
    HiFi reads
    Short reads
    vs
    GRCh38
    Long indels and
    SVs genome-wide

    View full-size slide

  17. HIFI READS SPAN STRUCTURAL VARIANTS
    1,733
    1,733 bp deletion
    deletion not
    detected
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    1,733
    Haplotype 1
    Haplotype 2
    HiFi reads
    Short reads
    Repeats

    View full-size slide

  18. CALL STRUCTURAL VARIANTS FROM HIFI READS WITH PBSV
    HiFi reads
    pbmm2
    pbsv discover
    pbsv call
    variant calls (vcf)
    SMRT Link
    Structural Variant
    Calling
    SMRT Link
    Mapping
    OR

    View full-size slide

  19. PBSV CALLS LARGE INDELS, INVERSIONS, TRANSLOCATIONS
    3.4 kb
    1.2 kb

    View full-size slide

  20. PBSV CALLS LARGE INDELS, INVERSIONS, TRANSLOCATIONS

    View full-size slide

  21. HIFI PBSV PERFORMANCE AGAINST BENCHMARK
    40%
    50%
    60%
    70%
    80%
    90%
    100%
    0 5 10 15 20 25 30
    Value
    Fold coverage
    Structural variants with pbsv
    Precision (HiFi)
    Recall (HiFi)

    View full-size slide

  22. VARIANT DETECTION BENCHMARKING (HG002)
    Recall | Precision (%)
    HiFi Coverage SNVs Indels SVs
    15-fold 99.53 | 99.89 95.16 | 96.23 97.41 | 94.48
    30-fold 99.89 | 99.95 98.90 | 98.99 98.00 | 95.29
    SNV and indel calls are from DeepVariant 1.0.0 and evaluated against the GIAB v4.2 small variant benchmark using Hap.py.
    SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.

    View full-size slide

  23. PATHOGENIC VARIANTS DETECTED WITH HIFI READS
    Hiatt SM, Lawlor JMJ, et al. (2020). Long-read sequencing for the diagnosis of neurodevelopmental disorders. bioRxiv, doi:10.1101/2020.07.02.185447
    Figure 1. Proband 6 has a de novo insertion
    resulting in duplication of exon 3 of CDKL5

    View full-size slide

  24. COMPREHENSIVE VARIANT DETECTION WITH HIFI READS
    -HiFi = mappability of long reads + base quality of short reads
    -HiFi + DeepVariant yield most accurate small variant calls currently available with
    a single technology.
    -HiFi + pbsv yield highly accurate structural variant calls, including inversions,
    translocations, and copy number variants.
    -Recommend 15-fold coverage for most discovery applications.
    Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA:
    HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)

    View full-size slide

  25. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the
    Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the
    overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment
    Analyzer are trademarks of Agilent Technologies Inc.
    All other trademarks are the sole property of their respective owners.
    www.pacb.com

    View full-size slide