Slide 1

Slide 1 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. HiFi Reads for Comprehensive Variant Detection William Rowell, Staff Scientist, Bioinformatics Applications, PacBio @nothingclever #SMRTLeiden

Slide 2

Slide 2 text

NEW PARADIGM OF ACCURATE, LONG READ DNA SEQUENCING Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. Article Metrics: Altmetric score* * Article is in the 98th percentile of the 254,341 tracked articles of a similar age in all journals. Published: 12 August 2019

Slide 3

Slide 3 text

TYPES OF GENOMIC VARIATION SMRT Sequencing provides comprehensive detection of all variant types.

Slide 4

Slide 4 text

VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels PacBio HiFi reads Short reads vs GRCh38

Slide 5

Slide 5 text

VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels PacBio HiFi reads Short reads vs GRCh38 Short reads miss ~80% of SVs, typically long insertion events or variants in difficult-to- map repetitive regions. This is not improved by increasing the coverage.

Slide 6

Slide 6 text

PACBIO LONG READS SPAN STRUCTURAL VARIANTS 1,733 1,733 bp deletion deletion not detected 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 Haplotype 1 Haplotype 2 PacBio HiFi reads Short reads Repeats

Slide 7

Slide 7 text

VARIATION IN A HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels Short reads PacBio high accuracy long reads improve mappability and increase variant detection in these regions Small variants missed in difficult-to-map regions of the human genome vs GRCh38 PacBio HiFi reads

Slide 8

Slide 8 text

HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME This impacts many medically-relevant genes

Slide 9

Slide 9 text

HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. List originally from Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med.

Slide 10

Slide 10 text

-SVs -“Structural Variant Calling” application in SMRT Link -or map with pbmm2 and call with pbsv from command line -SNVs and small indels -map with pbmm2 -Google DeepVariant -Optional phasing with WhatsHap RECOMMENDED VARIANT DETECTION WORKFLOWS

Slide 11

Slide 11 text

PACBIO STRUCTURAL VARIANT CALLING (PBSV) -Identifies signatures of structural variation -Calls variants and assigns genotypes -Recent updates: -improved sensitivity for large insertions and deletions -call duplications and copy number variation -simplified parameters with --hifi preset -report variants seen in a single read with at least 10% read support. -equivalent to “-A 1 -O 1 -S 0 -P 10”

Slide 12

Slide 12 text

-Variant calling pipeline powered by deep neural network -Fast and inexpensive -Run from binaries as well as Docker or Singularity images -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median read quality >99.9% -Model is updated regularly to support PacBio Chemistry and Software updates GOOGLE DEEPVARIANT Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).

Slide 13

Slide 13 text

UPDATES TO DEEPVARIANT PACBIO MODEL

Slide 14

Slide 14 text

UPDATES TO DEEPVARIANT PACBIO MODEL

Slide 15

Slide 15 text

UPDATES TO DEEPVARIANT PACBIO MODEL

Slide 16

Slide 16 text

singularity exec --bind $PWD \ docker://google/deepvariant:0.10.0 \ /opt/deepvariant/bin/run_deepvariant \ --model_type PACBIO \ --ref ./reference.fasta \ --reads ./aligned.ccs.bam \ --output_vcf ./output.vcf.gz \ --num_shards $(nproc) RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY Example suitable for amplicon analysis.

Slide 17

Slide 17 text

NIST GENOME IN A BOTTLE (GIAB) BENCHMARK Consortium dedicated to authoritative characterization of benchmark human genomes https://www.nist.gov/programs-projects/genome-bottle HG002 HG003 HG004 doi:10.1101/664623 Benchmark (or "High-confidence") variant calls and regions • Structural variants: Currently available for HG002 on GRCh37 • Small variants in more difficult regions: Currently available for HG002 on GRCh37 and GRCh38

Slide 18

Slide 18 text

GENOME IN A BOTTLE BENCHMARK AND COVERAGE Wenger, Peluso, et al. (2019) https://www.nature.com/articles/s41587-019-0217-9 HiFi fold coverage HiFi fold coverage HiFi fold coverage HiFi fold coverage 15-fold HiFi coverage HG002 HG003 HG004 Article | Published: 12 August 2019

Slide 19

Slide 19 text

VARIANT DETECTION BENCHMARKING (HG002) Recall | Precision (%) HiFi Coverage SNVs Indels SVs 15-fold 99.44 | 99.69 95.41 | 96.57 97.41 | 94.48 30-fold 99.97 | 99.87 98.78 | 98.90 98.00 | 95.29 SNV and indel calls are from DeepVariant 0.10.0 and evaluated against the GIAB v3.3.2 small variant benchmark using Hap.py. SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.

Slide 20

Slide 20 text

HIFI DATA ADDS NEW VARIATION TO GIAB BENCHMARKS -HiFi datasets for 7 GIAB samples are being used to improve SV and small variant benchmarks. -Upcoming small variant benchmark release v4.1 for HG002 will add: -~6% reference bases -~300,000 SNVs -~50,000 indels -Benchmark updates for other samples will follow. -HiFi datasets are included in the precisionFDA Truth V2 Challenge, which focuses on difficult-to-map regions. HG002 HG003 HG004

Slide 21

Slide 21 text

COMPREHENSIVE VARIANT DETECTION WITH HIFI READS -HiFi = mappability of long reads + base quality of short reads -Structural variants: SMRT Link or pbmm2 + pbsv -Added support for duplications and copy number variations -Small variants: DeepVariant -Added support for amplified fragments -Recommend 15-fold coverage for most discovery applications. Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA: HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)

Slide 22

Slide 22 text

Small variant detection Andrew Carroll Pi-Chuan Chang Richard Hall Alexey Kolesnikov Maria Nattestad Aaron Wenger Justin Zook, Justin Wagner, and the Genome in a Bottle Consortium ACKNOWLEDGMENTS Structural variant detection Armin Töpfer Aaron Wenger Justin Zook, Nate Olson, and the Genome in a Bottle Consortium

Slide 23

Slide 23 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com