Slide 1

Slide 1 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Comprehensive detection of variants in rare disease research with PacBio HiFi reads William Rowell, Staff Scientist, Pacific Biosciences

Slide 2

Slide 2 text

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014

Slide 3

Slide 3 text

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 What is missing? structural variants difficult-to-map regions repeat expansions phasing

Slide 4

Slide 4 text

20 30 40 50 60 Read quality, Phred % reads % reads Read length, kb 0 10 20 30 40 HIFI READ LONG ACCURATE = & 99 99.9 99.99 99.999 99.9999 %

Slide 5

Slide 5 text

HiFi and short reads from Genome in a Bottle: https://jimb.stanford.edu/giab HiFi reads Short reads 18 kb CYP2D6 Segdups Genes Human, HG002

Slide 6

Slide 6 text

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% ? Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014

Slide 7

Slide 7 text

LONG-READ SEQUENCING IN A RARE DISEASE COHORT Emily Farrow Tomi Pastinen Neil Miller 80 singletons with prior short-read WGS HiFi reads Alignment Variant calling Sequel IIe System Interpretation

Slide 8

Slide 8 text

DATA ANALYSIS WORKFLOW hifiasm DeepVariant (SNV, indel) WhatsHap (phasing) pbsv (SV) tandem-genotypes (STR) pbmm2 Alignment Variant calling Visualization & Interpretation De novo assembly Complex rearrangements HiFi reads Candidate Variants bcftools slivar svpack IGV SNVs, Indels, SVs

Slide 9

Slide 9 text

-open-source workflow and tools -dependencies managed by conda and singularity WORKFLOW IMPLEMENTATION -designed with HPC/cloud job scheduling and scaling in mind -Snakemake implementation from PacBio https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake -WDL implementation adapted by Microsoft Genomics https://github.com/PacificBiosciences/pb-human-wgs-workflow-wdl

Slide 10

Slide 10 text

SINGLE-NUCLEOTIDE VARIANTS AND INDELS 96.5 97.0 97.5 98.0 98.5 99.0 Sensitivity, % Specificity, % 98.8 98.9 99.0 99.1 99.2 99.3 HiFi WGS Short-read WGS Small variants per sample QC Metric Value SNV ts/tv 2.0 SNV het/hom 1.5 indel het/hom 2.0 Type Median sample SNV 4,064,900 indel 931,879 Concordance to Infinium Global Screening Microarray

Slide 11

Slide 11 text

STRUCTURAL VARIANTS 0 10,000 20,000 30,000 Short-read WGS HiFi WGS Deletion Duplication Insertion Inversion Translocation Short-read WGS HiFi WGS Deletion 4,374 9,174 Duplication 488 442 Insertion 4,844 12,437 Inversion - 94 Translocation 1,823 162 Total 11,529 22,309 Structural variants per sample

Slide 12

Slide 12 text

PRIORITIZING CANDIDATE VARIANTS 40 control samples 4,996,779 15,559 139 Variants Rare variants Coding, rare variants 21,737 244 12 Small variants Structural variants

Slide 13

Slide 13 text

PATHOGENIC SNV IN [GC]-RICH FIRST EXON Pediatric female cmh002060-01 Lissencephaly ADHD Mild intellectual disability 118,649,500 Gene CEP85L 30× HiFi reads 50× short reads chr6:118,651,267 C>A ENST00000368491 (CEP85L) start loss chr6 CEP85L start loss 118,650,500 118,651,500 118,652,500

Slide 14

Slide 14 text

PHASING VARIANTS IN RECESSIVE DISEASE GENE NPC1 compound heterozygous loss-of-function Pediatric female cmh001610-01 Failure to thrive High-frequency hearing impairment Hepatosplenomegaly Hepatic fibrosis Cholestasis Thrombocytopenia Gene NPC1 HiFi reads Allele 1 Allele 2 10.5 kb

Slide 15

Slide 15 text

CANDIDATE HETEROZYGOUS INVERSION VARIANT Pediatric male 5001-01 Growth delay Ptosis Anomalous tracheal cartilage Tracheobronchomalacia Respiratory insufficiency Multifocal atrial tachycardia Omphalocele Diaphgragmatic eventration Finger syndactyly HYLS1 exonic inversion Gene HiFi reads allele 1 HiFi reads allele 2 407 bp

Slide 16

Slide 16 text

REPEAT EXPANSION IN EXTENDED FAMILY cmh001541-04 Dystonia Seizures Ataxia Repeat expansion intronic to STARD7 HiFi reads allele 1 HiFi reads allele 2 407 bp 1,049 bp repeat expansion (A1-9 T)199

Slide 17

Slide 17 text

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% up to 67% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 Collaborations, presentations, and publications to date

Slide 18

Slide 18 text

SUMMARY – HIFI WGS RARE DISEASE STUDY HiFi WGS identifies “all” variants called with short-read WGS plus tens of thousands additional SNVs, indels, and SVs per genome. Candidate variants found in 30 of 80 samples from: • SNVs and indels in GC-rich regions and difficult-to-map regions • Structural variants • Phasing Future work: long-read population control databases, improved variant interpretation tools.

Slide 19

Slide 19 text

Children’s Mercy Kansas City Tomi Pastinen Emily Farrow Neil Miller Isabelle Thiffault ACKNOWLEDGEMENTS PacBio Aaron Wenger Shreyasee Chakraborty Christine Lambert Primo Baybayan Microsoft Genomics Roberto Lleras Matthew McLoughlin Benjamin Moskowitz

Slide 20

Slide 20 text

https://github.com/snakemake/snakemake https://docs.conda.io/en/latest/ https://sylabs.io/singularity/ https://github.com/arq5x/bedtools2 https://github.com/lh3/seqtk https://github.com/gmarcais/Jellyfish https://github.com/brentp/mosdepth https://github.com/amwenger/svpack https://github.com/google/deepvariant https://github.com/dnanexus-rnd/GLnexus https://github.com/whatshap/whatshap ACKNOWLEDGEMENTS https://github.com/brentp/slivar https://gitlab.com/mcfrith/last https://github.com/mcfrith/tandem-genotypes https://github.com/chhylp123/hifiasm https://github.com/lh3/gfatools https://github.com/lh3/calN50 https://github.com/lh3/minimap2 https://github.com/lh3/htsbox https://github.com/samtools/htslib https://github.com/samtools/samtools https://github.com/samtools/bcftools The Open-Source Bioinformatics Community

Slide 21

Slide 21 text

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com