Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comprehensive detection of variants in rare disease research with PacBio HiFi reads

William Rowell
October 06, 2021

Comprehensive detection of variants in rare disease research with PacBio HiFi reads

Presented at European Human Genetics Conference 2021 on August 29th, 2021

William Rowell

October 06, 2021
Tweet

More Decks by William Rowell

Other Decks in Science

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Comprehensive detection of variants in rare disease research with PacBio HiFi reads William Rowell, Staff Scientist, Pacific Biosciences
  2. MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read

    Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014
  3. MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read

    Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 What is missing? structural variants difficult-to-map regions repeat expansions phasing
  4. 20 30 40 50 60 Read quality, Phred % reads

    % reads Read length, kb 0 10 20 30 40 HIFI READ LONG ACCURATE = & 99 99.9 99.99 99.999 99.9999 %
  5. HiFi and short reads from Genome in a Bottle: https://jimb.stanford.edu/giab

    HiFi reads Short reads 18 kb CYP2D6 Segdups Genes Human, HG002
  6. MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read

    Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% ? Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014
  7. LONG-READ SEQUENCING IN A RARE DISEASE COHORT Emily Farrow Tomi

    Pastinen Neil Miller 80 singletons with prior short-read WGS HiFi reads Alignment Variant calling Sequel IIe System Interpretation
  8. DATA ANALYSIS WORKFLOW hifiasm DeepVariant (SNV, indel) WhatsHap (phasing) pbsv

    (SV) tandem-genotypes (STR) pbmm2 Alignment Variant calling Visualization & Interpretation De novo assembly Complex rearrangements HiFi reads Candidate Variants bcftools slivar svpack IGV SNVs, Indels, SVs
  9. -open-source workflow and tools -dependencies managed by conda and singularity

    WORKFLOW IMPLEMENTATION -designed with HPC/cloud job scheduling and scaling in mind -Snakemake implementation from PacBio https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake -WDL implementation adapted by Microsoft Genomics https://github.com/PacificBiosciences/pb-human-wgs-workflow-wdl
  10. SINGLE-NUCLEOTIDE VARIANTS AND INDELS 96.5 97.0 97.5 98.0 98.5 99.0

    Sensitivity, % Specificity, % 98.8 98.9 99.0 99.1 99.2 99.3 HiFi WGS Short-read WGS Small variants per sample QC Metric Value SNV ts/tv 2.0 SNV het/hom 1.5 indel het/hom 2.0 Type Median sample SNV 4,064,900 indel 931,879 Concordance to Infinium Global Screening Microarray
  11. STRUCTURAL VARIANTS 0 10,000 20,000 30,000 Short-read WGS HiFi WGS

    Deletion Duplication Insertion Inversion Translocation Short-read WGS HiFi WGS Deletion 4,374 9,174 Duplication 488 442 Insertion 4,844 12,437 Inversion - 94 Translocation 1,823 162 Total 11,529 22,309 Structural variants per sample
  12. PRIORITIZING CANDIDATE VARIANTS 40 control samples 4,996,779 15,559 139 Variants

    Rare variants Coding, rare variants 21,737 244 12 Small variants Structural variants
  13. PATHOGENIC SNV IN [GC]-RICH FIRST EXON Pediatric female cmh002060-01 Lissencephaly

    ADHD Mild intellectual disability 118,649,500 Gene CEP85L 30× HiFi reads 50× short reads chr6:118,651,267 C>A ENST00000368491 (CEP85L) start loss chr6 CEP85L start loss 118,650,500 118,651,500 118,652,500
  14. PHASING VARIANTS IN RECESSIVE DISEASE GENE NPC1 compound heterozygous loss-of-function

    Pediatric female cmh001610-01 Failure to thrive High-frequency hearing impairment Hepatosplenomegaly Hepatic fibrosis Cholestasis Thrombocytopenia Gene NPC1 HiFi reads Allele 1 Allele 2 10.5 kb
  15. CANDIDATE HETEROZYGOUS INVERSION VARIANT Pediatric male 5001-01 Growth delay Ptosis

    Anomalous tracheal cartilage Tracheobronchomalacia Respiratory insufficiency Multifocal atrial tachycardia Omphalocele Diaphgragmatic eventration Finger syndactyly HYLS1 exonic inversion Gene HiFi reads allele 1 HiFi reads allele 2 407 bp
  16. REPEAT EXPANSION IN EXTENDED FAMILY cmh001541-04 Dystonia Seizures Ataxia Repeat

    expansion intronic to STARD7 HiFi reads allele 1 HiFi reads allele 2 407 bp 1,049 bp repeat expansion (A1-9 T)199
  17. MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read

    Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% up to 67% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 Collaborations, presentations, and publications to date
  18. SUMMARY – HIFI WGS RARE DISEASE STUDY HiFi WGS identifies

    “all” variants called with short-read WGS plus tens of thousands additional SNVs, indels, and SVs per genome. Candidate variants found in 30 of 80 samples from: • SNVs and indels in GC-rich regions and difficult-to-map regions • Structural variants • Phasing Future work: long-read population control databases, improved variant interpretation tools.
  19. Children’s Mercy Kansas City Tomi Pastinen Emily Farrow Neil Miller

    Isabelle Thiffault ACKNOWLEDGEMENTS PacBio Aaron Wenger Shreyasee Chakraborty Christine Lambert Primo Baybayan Microsoft Genomics Roberto Lleras Matthew McLoughlin Benjamin Moskowitz
  20. https://github.com/snakemake/snakemake https://docs.conda.io/en/latest/ https://sylabs.io/singularity/ https://github.com/arq5x/bedtools2 https://github.com/lh3/seqtk https://github.com/gmarcais/Jellyfish https://github.com/brentp/mosdepth https://github.com/amwenger/svpack https://github.com/google/deepvariant https://github.com/dnanexus-rnd/GLnexus

    https://github.com/whatshap/whatshap ACKNOWLEDGEMENTS https://github.com/brentp/slivar https://gitlab.com/mcfrith/last https://github.com/mcfrith/tandem-genotypes https://github.com/chhylp123/hifiasm https://github.com/lh3/gfatools https://github.com/lh3/calN50 https://github.com/lh3/minimap2 https://github.com/lh3/htsbox https://github.com/samtools/htslib https://github.com/samtools/samtools https://github.com/samtools/bcftools The Open-Source Bioinformatics Community
  21. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com