Slide 1

Slide 1 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Comprehensive Variant Detection with PacBio HiFi Reads William Rowell, Staff Scientist, Bioinformatics Applications, PacBio @nothingclever

Slide 2

Slide 2 text

HIFI READS ARE LONG (15-20 KB) AND ACCURATE (99%)

Slide 3

Slide 3 text

TYPES OF GENOMIC VARIATION SMRT Sequencing provides comprehensive detection of all variant types.

Slide 4

Slide 4 text

HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38

Slide 5

Slide 5 text

HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38 SNVs and indels in difficult regions

Slide 6

Slide 6 text

HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE HUMAN GENOME Short reads HiFi reads STRC STRC is a congenital deafness gene that requires long reads to cover all exons.

Slide 7

Slide 7 text

HIFI READS IMPROVE MAPPABILITY IN MANY MEDICALLY-RELEVANT GENES % problem exons resolved Genes 100% ABCC6, ABCD1, ACAN, ACSM2B, AKR1C2, ALG1, ANKRD11, BCR, CATSPER2, CD177, CEL, CES1, CFH, CFHR1, CFHR3, CFHR4, CGB, CHEK2, CISD2, CLCNKA, CLCNKB, CORO1A, COX10, CRYBB2, CSH1, CYP11B1, CYP11B2, CYP21A2, CYP2A6, CYP2D6, CYP2F1, CYP4A22, DDX11, DHRS4L1, DIS3L2, DND1, DPY19L2, DUOX2, ESRRA, F8, FAM120A, FAM205A, FANCD2, FCGR1A, FCGR2A, FCGR3A, FCGR3B, FLG, FLNC, FOXD4, FOXO3, FUT3, GBA, GFRA2, GON4L, GRM5, GSTM1, GYPA, GYPB, GYPE, HBA1, HBA2, HBG1, HBG2, HP, HS6ST1, IDS, IFT122, IKBKG, IL9R, KIR2DL1, KIR2DL3, KMT2C, KRT17, KRT6A, KRT6B, KRT6C, KRT81, KRT86, LEFTY2, LPA, MST1, MUC5B, MYH6, MYH7, NEB, NLGN4X, NLGN4Y, NOS2, NOTCH2, NXF5, OPN1LW, OR2T5, OR51A2, PCDH11X, PCDHB4, PGAM1, PHC1, PIK3CA, PKD1, PLA2G10, PLEKHM1, PLG, PMS2, PRB1, PRDM9, PROS1, RAB40AL, RALGAPA1, RANBP2, RHCE, RHD, RHPN2, ROCK1, SAA1, SDHA, SDHC, SFTPA1, SFTPA2, SIGLEC14, SLC6A8, SMG1, SPATA31C1, SPTLC1, SRGAP2, SSX7, STAT5B, STK19, STRC, SULT1A1, SUZ12, TBX20, TCEB3C, TLR1, TLR6, TMEM231, TNXB, TRIOBP, TRPA1, TTN, TUBA1A, TUBB2B, UGT1A5, UGT2B15, UGT2B17, UNC93B1, VCY, VWF, WDR72, ZNF419, ZNF592, ZNF674 [75%, 100%) ANAPC1, C4A, C4B, CHRNA7, CR1, DUX4, FCGR2B, HYDIN, OTOA, PDPK1, TMLHE [50%, 75%) ADAMTSL2, CDY2A, DAZ1, GTF2I, NAIP, OCLN, RPS17 [25%, 50%) DAZ2, DAZ3, KIR3DL1, OPN1MW, PPIP5K1 (0%, 25%) NCF1, RBMY1A1 0% BPY2, CCL3L1, CCL4L1, CDY1, CFC1, CFC1B, GTF2IRD2, HSFY1, MRC1, OR4F5, PRY, PRY2, SMN1, SMN2, TSPY1, XKRY 16 2 5 7 11 152 Genes

Slide 8

Slide 8 text

GOOGLE DEEPVARIANT IS A HIGHLY ACCURATE SMALL VARIANT CALLER FOR HIFI READS Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018). -Variant calling pipeline powered by deep neural network -Fast and inexpensive -Run from binaries as well as Docker or Singularity containers -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median read quality >99.9% -Model is updated regularly to support PacBio Chemistry and Software updates

Slide 9

Slide 9 text

RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY Example suitable for amplicon analysis. singularity exec \ docker://google/deepvariant:1.0.0 \ /opt/deepvariant/bin/run_deepvariant \ --model_type PACBIO \ --ref ./reference.fasta \ --reads ./aligned.ccs.bam \ --output_vcf ./output.vcf.gz \ --num_shards $(nproc)

Slide 10

Slide 10 text

PRECISION & RECALL Variant calls Benchmark (“truth”) variants Precision percentage of calls that are correct = TP/(TP+FP) Recall percentage of truth that is called = TP/(TP+FN) Metric Abbreviation Benchmark Variant calls True Positive TP ✓ ✓ False Positive FP - ✓ False Negative FN ✓ - Benchmark Variant Calls TP FN FP

Slide 11

Slide 11 text

PRECISIONFDA TRUTH CHALLENGE V2 https://precision.fda.gov/challenges/10/view/results HG002 HG003 HG004 35× Illumina NovaSeq ✓ ✓ ✓ 35× HiFi, PacBio Sequel II System ✓ ✓ ✓ 60× ONT PromethION ✓ ✓ ✓ V4 Benchmark ✓ Blinded Blinded

Slide 12

Slide 12 text

https://www.pacb.com/blog/precisionfda-challenge/ 24 Illumina 20 Multi 17 HiFi 3 ONT 64 entries

Slide 13

Slide 13 text

99.9 90 99 97 99.7 Accuracy, F1(%) precisionFDA Entries Top 12 entries and 25 of top 26 use PacBio HiFi reads Illumina Multi HiFi ONT HiFi DeepVariant Illumina DeepVariant Illumina GATK ONT DeepVariant

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

TYPES OF GENOMIC VARIATION SMRT Sequencing provides comprehensive detection of all variant types.

Slide 16

Slide 16 text

TYPES OF GENOMIC VARIATION SMRT Sequencing provides comprehensive detection of all variant types.

Slide 17

Slide 17 text

HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38 Long indels and SVs genome-wide

Slide 18

Slide 18 text

HIFI READS SPAN STRUCTURAL VARIANTS 1,733 1,733 bp deletion deletion not detected 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 Haplotype 1 Haplotype 2 HiFi reads Short reads Repeats

Slide 19

Slide 19 text

CALL STRUCTURAL VARIANTS FROM HIFI READS WITH PBSV HiFi reads pbmm2 pbsv discover pbsv call variant calls (vcf) SMRT Link Structural Variant Calling SMRT Link Mapping OR

Slide 20

Slide 20 text

PBSV CALLS LARGE INDELS, INVERSIONS, TRANSLOCATIONS 3.4 kb 1.2 kb

Slide 21

Slide 21 text

PBSV CALLS LARGE INDELS, INVERSIONS, TRANSLOCATIONS

Slide 22

Slide 22 text

HIFI PBSV PERFORMANCE AGAINST BENCHMARK 40% 50% 60% 70% 80% 90% 100% 0 5 10 15 20 25 30 Value Fold coverage Structural variants with pbsv Precision (HiFi) Recall (HiFi)

Slide 23

Slide 23 text

VARIANT DETECTION BENCHMARKING (HG002) Recall | Precision (%) HiFi Coverage SNVs Indels SVs 15-fold 99.53 | 99.89 95.16 | 96.23 97.41 | 94.48 30-fold 99.89 | 99.95 98.90 | 98.99 98.00 | 95.29 SNV and indel calls are from DeepVariant 1.0.0 and evaluated against the GIAB v4.2 small variant benchmark using Hap.py. SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.

Slide 24

Slide 24 text

PATHOGENIC VARIANTS DETECTED WITH HIFI READS Hiatt SM, Lawlor JMJ, et al. (2020). Long-read sequencing for the diagnosis of neurodevelopmental disorders. bioRxiv, doi:10.1101/2020.07.02.185447 Figure 1. Proband 6 has a de novo insertion resulting in duplication of exon 3 of CDKL5

Slide 25

Slide 25 text

COMPREHENSIVE VARIANT DETECTION WITH HIFI READS -HiFi = mappability of long reads + base quality of short reads -HiFi + DeepVariant yield most accurate small variant calls currently available with a single technology. -HiFi + pbsv yield highly accurate structural variant calls, including inversions, translocations, and copy number variants. -Recommend 15-fold coverage for most discovery applications. Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA: HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)

Slide 26

Slide 26 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com