NEW PARADIGM OF ACCURATE, LONG READ DNA SEQUENCING
Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology.
Article Metrics:
Altmetric score*
* Article is in the 98th percentile of the
254,341 tracked articles of a similar age
in all journals.
Published: 12 August 2019
Slide 3
Slide 3 text
TYPES OF GENOMIC VARIATION
SMRT Sequencing provides comprehensive detection of all variant types.
Slide 4
Slide 4 text
VARIATION IN A HUMAN GENOME
5 Mb 3 Mb 10 Mb
1 bp
SNVs
≥50 bp
structural variants
1-49 bp
indels
PacBio
HiFi reads
Short reads
vs
GRCh38
Slide 5
Slide 5 text
VARIATION IN A HUMAN GENOME
5 Mb 3 Mb 10 Mb
1 bp
SNVs
≥50 bp
structural variants
1-49 bp
indels
PacBio
HiFi reads
Short reads
vs
GRCh38
Short reads miss ~80%
of SVs, typically long
insertion events or
variants in difficult-to-
map repetitive regions.
This is not improved by
increasing the
coverage.
Slide 6
Slide 6 text
PACBIO LONG READS SPAN STRUCTURAL VARIANTS
1,733
1,733 bp deletion
deletion not
detected
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
1,733
Haplotype 1
Haplotype 2
PacBio
HiFi reads
Short
reads
Repeats
Slide 7
Slide 7 text
VARIATION IN A HUMAN GENOME
5 Mb 3 Mb 10 Mb
1 bp
SNVs
≥50 bp
structural variants
1-49 bp
indels
Short reads
PacBio high accuracy long reads improve mappability
and increase variant detection in these regions
Small variants missed in
difficult-to-map regions of
the human genome
vs
GRCh38
PacBio
HiFi reads
Slide 8
Slide 8 text
HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME
This impacts many medically-relevant genes
Slide 9
Slide 9 text
HiFi READS IMPROVE MAPPABILITY IN HUMAN GENOME
Wenger, A. M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology.
List originally from Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation
sequencing. Genet Med.
Slide 10
Slide 10 text
-SVs
-“Structural Variant Calling”
application in SMRT Link
-or map with pbmm2 and call with
pbsv from command line
-SNVs and small indels
-map with pbmm2
-Google DeepVariant
-Optional phasing with WhatsHap
RECOMMENDED VARIANT DETECTION WORKFLOWS
Slide 11
Slide 11 text
PACBIO STRUCTURAL VARIANT CALLING (PBSV)
-Identifies signatures of structural variation
-Calls variants and assigns genotypes
-Recent updates:
-improved sensitivity for large insertions and deletions
-call duplications and copy number variation
-simplified parameters with --hifi preset
-report variants seen in a single read with at least 10% read
support.
-equivalent to “-A 1 -O 1 -S 0 -P 10”
Slide 12
Slide 12 text
-Variant calling pipeline powered by
deep neural network
-Fast and inexpensive
-Run from binaries as well as Docker
or Singularity images
-PacBio model trained on HiFi reads
from Sequel and Sequel II Systems
with median read quality >99.9%
-Model is updated regularly to
support PacBio Chemistry and
Software updates
GOOGLE DEEPVARIANT
Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).
Slide 13
Slide 13 text
UPDATES TO DEEPVARIANT PACBIO MODEL
Slide 14
Slide 14 text
UPDATES TO DEEPVARIANT PACBIO MODEL
Slide 15
Slide 15 text
UPDATES TO DEEPVARIANT PACBIO MODEL
Slide 16
Slide 16 text
singularity exec --bind $PWD \
docker://google/deepvariant:0.10.0 \
/opt/deepvariant/bin/run_deepvariant \
--model_type PACBIO \
--ref ./reference.fasta \
--reads ./aligned.ccs.bam \
--output_vcf ./output.vcf.gz \
--num_shards $(nproc)
RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY
Example suitable for amplicon analysis.
Slide 17
Slide 17 text
NIST GENOME IN A BOTTLE (GIAB) BENCHMARK
Consortium dedicated to authoritative characterization of benchmark
human genomes
https://www.nist.gov/programs-projects/genome-bottle
HG002
HG003 HG004
doi:10.1101/664623
Benchmark (or "High-confidence") variant calls and regions
• Structural variants: Currently available for HG002 on GRCh37
• Small variants in more difficult regions: Currently available for HG002 on GRCh37 and GRCh38
Slide 18
Slide 18 text
GENOME IN A BOTTLE BENCHMARK AND COVERAGE
Wenger, Peluso, et al. (2019) https://www.nature.com/articles/s41587-019-0217-9
HiFi fold coverage
HiFi fold coverage
HiFi fold coverage
HiFi fold coverage
15-fold HiFi coverage
HG002
HG003 HG004
Article | Published: 12 August 2019
Slide 19
Slide 19 text
VARIANT DETECTION BENCHMARKING (HG002)
Recall | Precision (%)
HiFi Coverage SNVs Indels SVs
15-fold 99.44 | 99.69 95.41 | 96.57 97.41 | 94.48
30-fold 99.97 | 99.87 98.78 | 98.90 98.00 | 95.29
SNV and indel calls are from DeepVariant 0.10.0 and evaluated against the GIAB v3.3.2 small variant benchmark using Hap.py.
SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.
Slide 20
Slide 20 text
HIFI DATA ADDS NEW VARIATION TO GIAB BENCHMARKS
-HiFi datasets for 7 GIAB samples are being used to
improve SV and small variant benchmarks.
-Upcoming small variant benchmark release v4.1 for
HG002 will add:
-~6% reference bases
-~300,000 SNVs
-~50,000 indels
-Benchmark updates for other samples will follow.
-HiFi datasets are included in the precisionFDA Truth V2
Challenge, which focuses on difficult-to-map regions.
HG002
HG003 HG004
Slide 21
Slide 21 text
COMPREHENSIVE VARIANT DETECTION WITH HIFI READS
-HiFi = mappability of long reads + base quality of short reads
-Structural variants: SMRT Link or pbmm2 + pbsv
-Added support for duplications and copy number variations
-Small variants: DeepVariant
-Added support for amplified fragments
-Recommend 15-fold coverage for most discovery applications.
Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA:
HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)
Slide 22
Slide 22 text
Small variant detection
Andrew Carroll
Pi-Chuan Chang
Richard Hall
Alexey Kolesnikov
Maria Nattestad
Aaron Wenger
Justin Zook, Justin Wagner, and the
Genome in a Bottle Consortium
ACKNOWLEDGMENTS
Structural variant detection
Armin Töpfer
Aaron Wenger
Justin Zook, Nate Olson, and the
Genome in a Bottle Consortium