MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014
MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Exome Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants ~5% explanation rate ~10% ~30% ~40% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 What is missing? structural variants difficult-to-map regions repeat expansions phasing
LONG-READ SEQUENCING IN A RARE DISEASE COHORT Emily Farrow Tomi Pastinen Neil Miller 80 singletons with prior short-read WGS HiFi reads Alignment Variant calling Sequel IIe System Interpretation
-open-source workflow and tools -dependencies managed by conda and singularity WORKFLOW IMPLEMENTATION -designed with HPC/cloud job scheduling and scaling in mind -Snakemake implementation from PacBio https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake -WDL implementation adapted by Microsoft Genomics https://github.com/PacificBiosciences/pb-human-wgs-workflow-wdl
MORE COMPLETE VARIANT DETECTION YIELDS MORE INSIGHTS Karyotyping Microarrays Short-read Sequencing Long-read Sequencing Exome Genome HiFi Genome Chromosomal abnormalities Copy-number variants >50kb SNVs & indels, some large exonic variants SNVs, indels, some large variants SNVs, indels, SVs, CNVs, phasing, translocations, inversions, repeat expansions ~5% explanation rate ~10% ~30% ~40% up to 67% Phelan Proc. of Greenwood Genetics Center 1996 De Vries AJHG 2008 De Ligt NEJM 2012 Gilissen Nature 2014 Collaborations, presentations, and publications to date
SUMMARY – HIFI WGS RARE DISEASE STUDY HiFi WGS identifies “all” variants called with short-read WGS plus tens of thousands additional SNVs, indels, and SVs per genome. Candidate variants found in 30 of 80 samples from: • SNVs and indels in GC-rich regions and difficult-to-map regions • Structural variants • Phasing Future work: long-read population control databases, improved variant interpretation tools.
Children’s Mercy Kansas City Tomi Pastinen Emily Farrow Neil Miller Isabelle Thiffault ACKNOWLEDGEMENTS PacBio Aaron Wenger Shreyasee Chakraborty Christine Lambert Primo Baybayan Microsoft Genomics Roberto Lleras Matthew McLoughlin Benjamin Moskowitz