MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 What is missing? structural variants difficult-to-map regions repeat expansions phasing

20 30 40 50 60 Read quality, Phred % reads % reads Read length, kb 0 10 20 30 40 HIFI READ LONG ACCURATE = & 99 99.9 99.99 99.999 99.9999 %

HiFi and short reads from Genome in a Bottle: HiFi reads Short reads 18 kb CYP2D6 Segdups Genes Human, HG002

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% ? Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014

LONG-READ SEQUENCING IN A RARE DISEASE COHORT Emily Farrow Tomi Pastinen Neil Miller 80 singletons with prior short-read WGS HiFi reads Alignment Variant calling Sequel IIe System Interpretation

DATA ANALYSIS WORKFLOW hifiasm DeepVariant (SNV, indel) WhatsHap (phasing) pbsv (SV) tandem-genotypes (STR) pbmm2 Alignment Variant calling Visualization & Interpretation De novo assembly Complex rearrangements HiFi reads Candidate Variants bcftools slivar svpack IGV SNVs, Indels, SVs

-open-source workflow and tools -dependencies managed by conda and singularity WORKFLOW IMPLEMENTATION -designed with HPC/cloud job scheduling and scaling in mind -Snakemake implementation from PacBio -WDL implementation adapted by Microsoft Genomics

SINGLE-NUCLEOTIDE VARIANTS AND INDELS 96.5 97.0 97.5 98.0 98.5 99.0 Sensitivity, % Specificity, % 98.8 98.9 99.0 99.1 99.2 99.3 HiFi WGS Short-read WGS Small variants per sample QC Metric Value SNV ts/tv 2.0 SNV het/hom 1.5 indel het/hom 2.0 Type Median sample SNV 4,064,900 indel 931,879 Concordance to Infinium Global Screening Microarray

STRUCTURAL VARIANTS 0 10,000 20,000 30,000 Short-read WGS HiFi WGS Deletion Duplication Insertion Inversion Translocation Short-read WGS HiFi WGS Deletion 4,374 9,174 Duplication 488 442 Insertion 4,844 12,437 Inversion - 94 Translocation 1,823 162 Total 11,529 22,309 Structural variants per sample

PRIORITIZING CANDIDATE VARIANTS 40 control samples 4,996,779 15,559 139 Variants Rare variants Coding, rare variants 21,737 244 12 Small variants Structural variants

PATHOGENIC SNV IN [GC]-RICH FIRST EXON Pediatric female cmh002060-01 Lissencephaly ADHD Mild intellectual disability 118,649,500 Gene CEP85L 30× HiFi reads 50× short reads chr6:118,651,267 C>A ENST00000368491 (CEP85L) start loss chr6 CEP85L start loss 118,650,500 118,651,500 118,652,500

PHASING VARIANTS IN RECESSIVE DISEASE GENE NPC1 compound heterozygous loss-of-function Pediatric female cmh001610-01 Failure to thrive High-frequency hearing impairment Hepatosplenomegaly Hepatic fibrosis Cholestasis Thrombocytopenia Gene NPC1 HiFi reads Allele 1 Allele 2 10.5 kb

CANDIDATE HETEROZYGOUS INVERSION VARIANT Pediatric male 5001-01 Growth delay Ptosis Anomalous tracheal cartilage Tracheobronchomalacia Respiratory insufficiency Multifocal atrial tachycardia Omphalocele Diaphgragmatic eventration Finger syndactyly HYLS1 exonic inversion Gene HiFi reads allele 1 HiFi reads allele 2 407 bp

REPEAT EXPANSION IN EXTENDED FAMILY cmh001541-04 Dystonia Seizures Ataxia Repeat expansion intronic to STARD7 HiFi reads allele 1 HiFi reads allele 2 407 bp 1,049 bp repeat expansion (A1-9 T)199

MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% up to 67% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 Collaborations, presentations, and publications to date

SUMMARY – HIFI WGS RARE DISEASE STUDY HiFi WGS identifies “all” variants called with short-read WGS plus tens of thousands additional SNVs, indels, and SVs per genome. Candidate variants found in 30 of 80 samples from: • SNVs and indels in GC-rich regions and difficult-to-map regions • Structural variants • Phasing Future work: long-read population control databases, improved variant interpretation tools.

