Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comprehensive detection of variants in rare disease research with PacBio HiFi reads

William Rowell
October 06, 2021

Comprehensive detection of variants in rare disease research with PacBio HiFi reads

Presented at European Human Genetics Conference 2021 on August 29th, 2021

William Rowell

October 06, 2021
Tweet

More Decks by William Rowell

Other Decks in Science

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved.
    Comprehensive detection of variants in rare disease
    research with PacBio HiFi reads
    William Rowell, Staff Scientist, Pacific Biosciences

    View Slide

  2. MORE COMPLETE VARIANT DETECTION YIELDS MORE
    INSIGHTS
    Karyotyping Microarrays
    Short-read Sequencing
    Exome Genome
    Chromosomal
    abnormalities
    Copy-number
    variants >50kb
    SNVs & indels,
    some large
    exonic variants
    SNVs, indels, some
    large variants
    ~5% explanation
    rate
    ~10% ~30% ~40%
    Phelan Proc. of Greenwood
    Genetics Center 1996
    De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014

    View Slide

  3. MORE COMPLETE VARIANT DETECTION YIELDS MORE
    INSIGHTS
    Karyotyping Microarrays
    Short-read Sequencing
    Exome Genome
    Chromosomal
    abnormalities
    Copy-number
    variants >50kb
    SNVs & indels,
    some large
    exonic variants
    SNVs, indels, some
    large variants
    ~5% explanation
    rate
    ~10% ~30% ~40%
    Phelan Proc. of Greenwood
    Genetics Center 1996
    De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014
    What is missing?
    structural variants
    difficult-to-map regions
    repeat expansions
    phasing

    View Slide

  4. 20 30 40 50 60
    Read quality, Phred
    % reads
    % reads
    Read length, kb
    0 10 20 30 40
    HIFI READ LONG ACCURATE
    = &
    99 99.9 99.99 99.999 99.9999 %

    View Slide

  5. HiFi and short reads from Genome in a Bottle: https://jimb.stanford.edu/giab
    HiFi reads
    Short reads
    18 kb
    CYP2D6
    Segdups
    Genes
    Human, HG002

    View Slide

  6. MORE COMPLETE VARIANT DETECTION YIELDS MORE
    INSIGHTS
    Karyotyping Microarrays
    Short-read Sequencing Long-read Sequencing
    Exome Genome HiFi Genome
    Chromosomal
    abnormalities
    Copy-number
    variants >50kb
    SNVs & indels,
    some large
    exonic variants
    SNVs, indels, some
    large variants
    SNVs, indels, SVs, CNVs,
    phasing, translocations,
    inversions, repeat expansions
    ~5% explanation
    rate
    ~10% ~30% ~40% ?
    Phelan Proc. of Greenwood
    Genetics Center 1996
    De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014

    View Slide

  7. LONG-READ SEQUENCING IN A RARE DISEASE COHORT
    Emily
    Farrow
    Tomi
    Pastinen
    Neil
    Miller
    80 singletons
    with prior short-read WGS
    HiFi reads
    Alignment
    Variant calling
    Sequel IIe System
    Interpretation

    View Slide

  8. DATA ANALYSIS WORKFLOW
    hifiasm
    DeepVariant (SNV, indel)
    WhatsHap (phasing)
    pbsv (SV)
    tandem-genotypes (STR)
    pbmm2 Alignment
    Variant calling
    Visualization & Interpretation
    De novo assembly
    Complex
    rearrangements
    HiFi reads
    Candidate Variants
    bcftools
    slivar
    svpack
    IGV
    SNVs, Indels, SVs

    View Slide

  9. -open-source workflow and tools
    -dependencies managed by conda
    and singularity
    WORKFLOW IMPLEMENTATION
    -designed with HPC/cloud job
    scheduling and scaling in mind
    -Snakemake implementation from PacBio
    https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake
    -WDL implementation adapted by Microsoft Genomics
    https://github.com/PacificBiosciences/pb-human-wgs-workflow-wdl

    View Slide

  10. SINGLE-NUCLEOTIDE VARIANTS AND INDELS
    96.5
    97.0
    97.5
    98.0
    98.5
    99.0
    Sensitivity, %
    Specificity, %
    98.8 98.9 99.0 99.1 99.2 99.3
    HiFi WGS
    Short-read WGS
    Small variants per sample
    QC Metric Value
    SNV ts/tv 2.0
    SNV het/hom 1.5
    indel het/hom 2.0
    Type Median sample
    SNV 4,064,900
    indel 931,879
    Concordance to Infinium Global
    Screening Microarray

    View Slide

  11. STRUCTURAL VARIANTS
    0
    10,000
    20,000
    30,000
    Short-read WGS HiFi WGS
    Deletion Duplication Insertion
    Inversion Translocation
    Short-read WGS HiFi WGS
    Deletion 4,374 9,174
    Duplication 488 442
    Insertion 4,844 12,437
    Inversion - 94
    Translocation 1,823 162
    Total 11,529 22,309
    Structural variants per sample

    View Slide

  12. PRIORITIZING CANDIDATE VARIANTS
    40 control samples
    4,996,779
    15,559
    139
    Variants
    Rare variants
    Coding, rare variants
    21,737
    244
    12
    Small variants Structural variants

    View Slide

  13. PATHOGENIC SNV IN [GC]-RICH FIRST EXON
    Pediatric female cmh002060-01
    Lissencephaly
    ADHD
    Mild intellectual disability
    118,649,500
    Gene CEP85L
    30× HiFi reads
    50× short reads
    chr6:118,651,267 C>A
    ENST00000368491 (CEP85L) start loss
    chr6
    CEP85L start loss
    118,650,500 118,651,500 118,652,500

    View Slide

  14. PHASING VARIANTS IN RECESSIVE DISEASE GENE
    NPC1 compound heterozygous loss-of-function
    Pediatric female cmh001610-01
    Failure to thrive
    High-frequency hearing impairment
    Hepatosplenomegaly
    Hepatic fibrosis
    Cholestasis
    Thrombocytopenia
    Gene NPC1
    HiFi reads
    Allele 1
    Allele 2
    10.5 kb

    View Slide

  15. CANDIDATE HETEROZYGOUS INVERSION VARIANT
    Pediatric male 5001-01
    Growth delay
    Ptosis
    Anomalous tracheal cartilage
    Tracheobronchomalacia
    Respiratory insufficiency
    Multifocal atrial tachycardia
    Omphalocele
    Diaphgragmatic eventration
    Finger syndactyly
    HYLS1 exonic inversion
    Gene
    HiFi reads
    allele 1
    HiFi reads
    allele 2
    407 bp

    View Slide

  16. REPEAT EXPANSION IN EXTENDED FAMILY
    cmh001541-04
    Dystonia
    Seizures
    Ataxia
    Repeat expansion intronic to STARD7
    HiFi reads
    allele 1
    HiFi reads
    allele 2
    407 bp
    1,049 bp repeat expansion
    (A1-9
    T)199

    View Slide

  17. MORE COMPLETE VARIANT DETECTION YIELDS MORE
    INSIGHTS
    Karyotyping Microarrays
    Short-read Sequencing Long-read Sequencing
    Exome Genome HiFi Genome
    Chromosomal
    abnormalities
    Copy-number
    variants >50kb
    SNVs & indels,
    some large
    exonic variants
    SNVs, indels, some
    large variants
    SNVs, indels, SVs, CNVs,
    phasing, translocations,
    inversions, repeat expansions
    ~5% explanation
    rate
    ~10% ~30% ~40% up to 67%
    Phelan Proc. of Greenwood
    Genetics Center 1996
    De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014
    Collaborations, presentations, and
    publications to date

    View Slide

  18. SUMMARY – HIFI WGS RARE DISEASE STUDY
    HiFi WGS identifies “all” variants called with short-read WGS plus tens of
    thousands additional SNVs, indels, and SVs per genome.
    Candidate variants found in 30 of 80 samples from:
    • SNVs and indels in GC-rich regions and difficult-to-map regions
    • Structural variants
    • Phasing
    Future work: long-read population control databases, improved variant
    interpretation tools.

    View Slide

  19. Children’s Mercy Kansas City
    Tomi Pastinen
    Emily Farrow
    Neil Miller
    Isabelle Thiffault
    ACKNOWLEDGEMENTS
    PacBio
    Aaron Wenger
    Shreyasee Chakraborty
    Christine Lambert
    Primo Baybayan
    Microsoft Genomics
    Roberto Lleras
    Matthew McLoughlin
    Benjamin Moskowitz

    View Slide

  20. https://github.com/snakemake/snakemake
    https://docs.conda.io/en/latest/
    https://sylabs.io/singularity/
    https://github.com/arq5x/bedtools2
    https://github.com/lh3/seqtk
    https://github.com/gmarcais/Jellyfish
    https://github.com/brentp/mosdepth
    https://github.com/amwenger/svpack
    https://github.com/google/deepvariant
    https://github.com/dnanexus-rnd/GLnexus
    https://github.com/whatshap/whatshap
    ACKNOWLEDGEMENTS
    https://github.com/brentp/slivar
    https://gitlab.com/mcfrith/last
    https://github.com/mcfrith/tandem-genotypes
    https://github.com/chhylp123/hifiasm
    https://github.com/lh3/gfatools
    https://github.com/lh3/calN50
    https://github.com/lh3/minimap2
    https://github.com/lh3/htsbox
    https://github.com/samtools/htslib
    https://github.com/samtools/samtools
    https://github.com/samtools/bcftools
    The Open-Source Bioinformatics Community

    View Slide

  21. For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo,
    PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method.
    Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc.
    All other trademarks are the sole property of their respective owners.
    www.pacb.com

    View Slide