Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Specifics of SMRT Sequencing Data

PacBio
August 01, 2013

Specifics of SMRT Sequencing Data

PacBio

August 01, 2013
Tweet

More Decks by PacBio

Other Decks in Science

Transcript

  1. FIND MEANING IN COMPLEXITY © Copyright 2013 by Pacific Biosciences

    of California, Inc. All rights reserved. Specifics of SMRT® Sequencing Data
  2. Overview of Release Specifics of SMRT® Sequencing Data Agenda 2

    Factors that impact read length Typical throughput per SMRT Cell Understanding PacBio accuracy PacBio® performance characteristics Filtering, loading and quality values Where to find additional information
  3. Highly Accurate Results 6 SMRT® Sequencing can achieve greater than

    99.999% (QV 50) accurate sequencing results for resequencing and de novo applications: 1. Near perfect consensus accuracy 2. Little or no sequence context bias 3. Unambiguous mappability of sequence reads
  4. 2nd Gen Sequencing i. Generate sequence read: ii. Map to

    reference: iii. Generate consensus (10x coverage): 7 GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGCTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGGCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG -TCCTGAGACACGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGAAGATAG GTCCTGATACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTGGACAATCGGGACTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGG-CAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACC GACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG G T GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAG GTCCTGAGACACGACAGCGACCTCTGACCGGACTCGCTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAG …GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAATGC… …GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAATGC… Reference Heterozygous SNP Homozygous SNP Reference match
  5. SMRT® Sequencing – Errors are Random 8 …GAATTCTTAACGTCTGAGACACGACATGCGACCTCTGCACCGGACTCGTCCGCGTTCTTTGGCAATCGGGATCAGCTTCGGGAGATGCGGCGCAGCTTGGGGATGGATAGGCGAGCAATGC… …GAATTCTTAACGTC-TGAGACACGACAGCGACCTCTGACCGGACTCG-TCCGCGTCTTTGG-CAATCGGGA-TCAG-CTTCGGGAGATGCGGCGCA-GCTTGGGGATGATAGGCGAGCAATGC… …GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAATGC…

    …GAA-TCTTAACGTCCTGAGACACG-CAGCGACCTCTGA-CGGACTCGCTCCGCGTCTTTGGACAATC--GATTCAGACTTCGGGAGATGCGGCGCAGGCTTGG-GATGATAGGCGAGCAATGC… …GAATTCTTAA-GTCCTGAGACACGACAGC-ACCTCTGACCTGAC-CGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGG-T-GGGGATGATAGGCGAGCA-TGC… …G-ATTC-TAACGTCCTGAGACACGACAGCGA-CTCTGACCGGACTCGTTCCGCGGCT-TGGACAA-CGGGATTCAGA-TTCGGGAG-TGCGGCGCAGGCTTGGGGATGATAG-CGAGCAATGC… …GAATTCTTAAC-TCCTGAGACACGACAGCGACCTC-GACCTGACTCGTTCC-CGTCTTTGGACAATCGGGA-TCAGACTTCG-GAGATGCGGCGCAGGCTT-GGGATGATAGGCGAGCAATGC… …GAATTCTTAACGT-CTGAGACACGACA-CGACCTCTGACCTGA-TCGTTCCGCGTCT-TGGACAATC-GGATTCAGACTTCGGG-GATGCGGCGCAGGC-TGGGGAAGATAGGCGAGCA-TGC… …GAATTCTT-ACGTCCTGATACACGACAGCG-CCTCTG-CCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCA-ACTTCGGGAGATGCGG-GCAGGCTTGGGGATGATAGGCGAGCAATGC… …GAATTCTTA-CGTCC-GAGACACGACAGCGACCTCT-ACC-GACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATG-GGCGCAGGCTTGGGGATGATAGGCGAGCAATGC… …GAATTCTTAACGTCCTGAGACACGACAG-GACCT--GACCTGACTCGT-CCGCGTC-TTGGACAATCGGGACTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGAT-GGCGAGCAATGC… …GAATTCTTAACGTCCTGAGAC-CGACAGCGACCTCTGACCTGACTCGTTCCGCGTCTTTG-ACAATCGG-ATTCAGACT-CGGGAGATGCGGCGCA-GCTTGGGGATG-TAGGCGA-CAATGC… …GA-TTCTTAACGTCCTGAGACAC-ACAGCGACCTCTGACCGGACTCGTTC-GCGTCTTTGG-CAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAGGCG-GCAATGC… …GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACC GACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGAGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAATGC… G T Quiver: SMRT Analysis BLASR: BMC Bioinformatics 13: 238. …GAATTCTTAACGTCCTGAGACACGACAGCGACCTCTGACCGGACTCGTTCCGCGTCTTTGGACAATCGGGATTCAGACTTCGGGGGATGCGGCGCAGGCTTGGGGATGATAGGCGAGCAATGC… Reference Heterozygous SNP Homozygous SNP Reference match i. Generate sequence read: ii. Map to reference: iii. Generate consensus (10x coverage):
  6. SMRT® Sequencing Accuracy 9 0 20 40 60 80 100

    Concordance - Accuracy (QV) Coverage 99.99999% (QV 70) 99.9999% (QV 60) 99.999% (QV 50) 99.99% (QV 40) 99.9% (QV 30) 99% (QV 20) 90% (QV 10) Data generated with P4-C2 chemistry on PacBio® RS II; Analyzed using Quiver with 2.0.1 SMRT® Analysis E. coli R. palustris S. aureus Perfect consensus
  7. Verified in the Literature 10 Accuracy is independent from read

    length Also confirmed through simulation Koren et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology 30, 693-700
  8. De Novo Assembly: Finished Genomes with >99.999% Accuracy Using Only

    PacBio® Reads • Utilizes all PacBio data from single, long-insert library – Longest reads for continuity – All reads for high consensus accuracy Hierarchal Genome Assembly Process (HGAP) Chin et al. (2013 ) Nonhybrid, finished microbial genome assemblies from long read SMRT sequencing data. Nature Methods 10, 563-669 doi:10.1038/nmeth.2474 11
  9. Meiothermus ruber from Start to Finish with 3 SMRT® Cells

    10 kb SMRTbell™ library 3 SMRT Cells (C2-C2 chemistry, PacBio® RS) Long seed reads (>5 kb) Pre-assembled long reads 5 contigs 1 contig Pre-assembly Single contig assembly 99.99965% concordance with reference 1 contig Celera® Assembler Minimus2 Quiver Collaboration with A. Clum, A. Copeland (Joint Genome Institute) 12
  10. Polish with Quiver for High Accuracy Organism Assembly size (bases)

    Differences with Sanger Reference Concordance with Sanger Reference Nominal QV SNPs Validated as Correct PacBio® Calls Remaining Differences QV Meiothermus ruber 3,098,781 11 99.99965% 54.5 7 4 58.9 M. ruber Sanger reference PacBio® reads Targeted Sanger validation https://github.com/PacificBiosciences/GenomicConsensus 13
  11. Most Uniform Coverage 14 Ross et al. (2013) Characterizing and

    measuring bias in sequence data. Genome Biology, 14:R51 doi:10.1186/gb-2013-14-5-r51.
  12. Resolve ‘Difficult-to-Sequence’ Regions Address genomic challenges with longer read lengths

    • Resolve long palindromes • Identify structural variants • Obtain accurate microsatellite lengths • Span homopolymeric, low-complexity, and highly repetitive regions • Delineate tandem repeats Loomis et al. (2013) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene. Genome Research, 23(1):121-8 Fragile X gene with >2 kb of repeat regions PacBio® reads span extreme CGG repeats and AT-rich regions 15
  13. Importance of Mapping for Sequencing Accuracy Mismapped short reads result

    in false-positive SNPs Carneiro et al. (2012) Pacific Biosciences Sequencing Technology for Genotyping and Variation Discovery in Human Data. BMC Genomics 13, 375-383. ? Illumina® HiSeq PacBio® RS ? Illumina® HiSeq PacBio® RS 16
  14. Highly Accurate Results 17 SMRT® Sequencing achieves greater than 99.999%

    (QV 50) accurate sequencing results for resequencing and de novo applications 1. Near perfect consensus accuracy 2. Little to no sequence context bias 3. Unambiguous mappability of sequence reads SMRT Sequencing has excellent performance in all three areas
  15. Another Way to Use Long Reads: CCS 18 Double-stranded insert

    Very High Single Molecule Accuracy Circular Consensus Sequencing (CCS)
  16. Reliably Detect Variant Mutations Below 0.1% Frequency All minor variants

    reliably detected down to 0.08% L180M 254 C A S202G 320 A  G M204V 326 A  G Single SMRT® Cell provides enough data to detect HBV minor variants: Poster: Sensitive Detection of Minor Variants and Viral Haplotypes 19
  17. Typical Results: Accuracy Characteristics of PacBio® Polymerases and Chemistries 20

    10 kb library, PacBio® RS II, Stage Start, 120 min movies, SMRT® Analysis 2.0.1 20 30 40 50 60 70 20 40 60 80 Quality Value Coverage E. coli P4-C2 P4-XL 20 30 40 50 60 70 20 40 60 80 Quality Value Coverage R. Palustris P4-C2 P4-XL
  18. Single-Molecule Circular Consensus Accuracy E. coli 2 kb library; PacBio®

    RS II, SMRT® Analysis 2.0.1 0.94 0.95 0.96 0.97 0.98 0.99 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Accuracy Number of Passes CCS Accuracy versus Number of Passes P4 - C2 P4 - XL 21
  19. PacBio® Performance Summary - Accuracy • Greater than 99.999% (QV

    50) accurate sequencing results • Best coverage uniformity with no amplification and minimal GC bias • Improved mappability with longer average read lengths • Sensitivity to detect minor variants at frequency less than 0.1% • Chemistry choices available to optimize read length and accuracy 22
  20. Template Pass 1 = Strand-Displacement Mode Adaptor Pass 1 =

    Single-Stranded Mode Template Pass 2 = Single-Stranded Mode Adaptor Pass 2 = Single-Stranded Mode (some ds) Template & Adaptor Passes 3+ = Strand-Displacement Mode Idealized SMRTbell™ Template Processing By Dual-Mode Polymerase 24 Mode change induced by template change Single-Stranded Mode Strand-Displacement Mode -
  21. From Polymerase Reads to Subreads or CCS Reads • Subreads

    (purple and gold) are separated by adapter sequences (green) • ≥2 full passes required for CCS • Either individual subreads or CCS reads can be used for subsequent analysis depending on application needs Polymerase Read Subread Circular Consensus Sequence (CCS) Read 25
  22. Read (of Insert) Definition: • Represents highest-quality single-sequence for an

    insert, regardless of number of passes • Generalizes CCS for <2 passes & RQ <0.9 • 1 or more passes • 1 molecule, 1 read Uses: • Library QC • Applications Subread Definition: • Single pass of template • Adapters removed • 1 molecule, >=1 subread Unique data: • Kinetic measurements • Rich QVs Uses: • Applications Polymerase Read Definition: • Sequence of nucleotides incorporated by polymerase while reading a template • Includes adapters • Often called “read” • Includes adapters • 1 molecule, 1 pol. read Uses: • QC of instrument run • Benchmarking Read Definitions in SMRT® Analysis v2.0.1 SMRTbell™ Template 26
  23. Mapped Subread vs. Mapped Read Length 27 Mapped (Polymerase) Read

    Length Mapped Subread Length 4 Kb 900 bp Mapped Polymerase Read Length Measure of ZMW sequencing productivity Upper bound by speed and fidelity of the polymerase and movie time Mapped Subread Length Measure of scientifically applicable sequence Upper bound by insert size and loading effects
  24. Subread Length Does Not Always Equal Mapped Read Length •

    Library quality – DNA damage – Contaminants • Fragment size distribution is based on shearing and size selection methods – Target library size – Shearing methodology – Size selection options to remove small fragments • Diffusion based loading favors loading of shorter fragments • Timing of movie collection 28 Start seq rxn Start movie without Stage Start Subread 1 Subread 2 2 kb 10-12 kb
  25. Comparison of Mapped Read Length & Subread Length Distributions for

    a 2 kb Library 29 Mapped Read Length Histogram Mapped Subread Length Histogram C2-C2 Chemistry, PacBio® RS, 1x90 movie
  26. Comparison of Mapped Read Length & Subread Length Distributions for

    a 10 kb Library 30 Mapped Read Length Histogram Mapped Subread Length Histogram C2-C2 Chemistry, PacBio® RS, 1x90 movie
  27. 6 kb SMRTbell™ template Capture early bases 1-2 kb not

    captured 6 kb contiguous coverage 4 kb contiguous coverage Data collection Final reads Stage Start Produces Longer Reads by Capturing Sequence as Soon as the Polymerase Begins Sequencing 31 Without Stage Start With Stage Start Read full insert
  28. Stage Start Useful For Increasing Subread Length in Large Insert

    Libraries (Example: Full Length HIV Amplicon – 9 kb) 32 No Stage Start Stage Start • Without stage start, reads pile up at 8 kb, missing the first 1 kb • With stage start, reads span the full HIV genome
  29. Removal of Adapter Dimers and Short Inserts: Diffusion vs. MagBead

    Loading Diffusion Loading MagBead Loading Additional Benefits: • Reduced input material: Prepare 10 kb libraries with 1 µg of gDNA • More robust performance: Higher and more consistent yield across samples 33
  30. MagBead Loading Increases Overall Mappable Reads and Subread Length Read

    Length Mapped subreads Increased mappable reads overall, Increases number of 4-10 kb subreads by 50% 34
  31. Fraction of Data Generated Had Subread Length >X Subread length

    (bases) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5000 10000 15000 20000 25000 30000 In collaboration with Jeff Rogers, Muthuswamy Raveendran at Baylor College of Medicine Mouse lemur samples provided by Anne Yoder, Duke Lemur Center Library N50 No Size Selection 4800 With Size Selection 9100 Size Selection Increases the Number of Long Subreads Mammalian Example 38
  32. PacBio® Performance Summary – Read Length • Exponential read length

    distribution • Generate average read lengths 4000-5000; with some reads as long as 20,000 bp • Key terminology – Polymerase read – Subread – Read of Insert • Parameters that impact read length and subread length – Library quality – Insert size – Distribution of library size / removal small fragments – MagBead loading vs. Diffusion loading – Use of Stage Start – Use of size selection – Movie collection time – To be discussed next: chemistry choice, loading parameters 39
  33. Library Quality Tied to Sequencing Performance • Potential system performance

    is highly dependent on sample quality & library insert size • Potential sources of variability – Sample damage – Sample degradation – Contaminants – Shearing size & distribution • Note that use of XL DNA Sequencing Kit 1.0 with known low quality or short insert libraries is Not Recommended 42
  34. Typical Microbial Performance for P4-C2 Chemistry, 10 kb Library 0

    20000 40000 60000 80000 100000 120000 0 1000 2000 3000 4000 5000 6000 # of Subreads per SMRT Cell Subread Length >X Cumulative Subread Length Distribution; 10 kb Library B. subtilis E. coli R. palustris 0 20 40 60 80 100 120 0 50 100 150 200 250 300 350 B. subtilis E. coli R. palustris Mean Mapped Read (Thousands) Mean Mappend MB Throughput per SMRT® Cell, 10 kb Library P4:C2 - Average of mean Mapped MegaBases P4:C2 - Average of mean Mapped Reads • Instrument: PacBio® RS II • Chemistry: P4 – C2 • Library: 10 kb • Size Selection: None • Collection Time: 1 x 120 min • Stage Start • MagBead Loading Mapped MB Mapped Reads Typical yields for microbial samples on PacBio RS II usually in range of 175 - 250 Mb per SMRT® Cell for good quality 10 kb libraries 43
  35. CCS Throughput vs Total SMRT® Cell Throughput • Total SMRT

    Cell throughput is a function of read length and number of reads per SMRT Cell passing filtering criteria • CCS throughput will always be lower – CCS reads require a minimum of two passes – Usable throughput per SMRT® Cell depends on insert length − Amplicon size * # loaded ZMW = max throughput – ~50-60K reads per SMRT Cell • Number of Reads per SMRT Cell will vary due to: – Insert Length – Required # of passes to reach desired QV – Chemistry choice – Instrument run conditions (loading conc., movie times, etc.) – Sample quality 1 x 45 movie, Pacbio® RS II Trading single molecule accuracy for # of reads 45
  36. Estimated Number of Reads per SMRT® Cell: P4 – C2

    Chemistry, RS II, No Stage Start, 1 x 45 movie Insert Size Full Pass Subreads* 2 pass CCS 3 pass CCS 4 pass CCS 5 pass CCS 0.5 kb 380,000 47,000 44,000 40,000 36,000 1.0 kb 190,000 40,000 34,000 26,000 17,000 1.5 kb 100,000 33,000 22,000 4,200 25 2.0 kb 70,000 26,000 4,500 5 0 2.5 kb 30,000 12,000 35 0 0 Estimated number of reads by insert size and number of passes derived 500 bp amplicon dataset Note: E. coli 2 kb library, SMRT Analysis 2.0.1 Post Filter polymerase reads for this data set was ~60K 0.94 0.95 0.96 0.97 0.98 0.99 1 2 3 4 5 6 7 8 9 10 Accuracy Number of Passes CCS Accuracy by Number of Passes 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 Full pass subreads 2 pass CCS 3 pass CCS 4 pass CCS 5 pass CCS Estimated Number of Reads 0.5 kb - 45 min 1.0 kb - 45 min 1.5 kb - 45 min 2.0 kb - 45 min 2.5 kb - 45 min Estimated Reads by Insert Size and Passes
  37. Operational Duty Cycle and CCS Analysis • When collecting long

    movies for samples containing short inserts (<2 kb) to generate more full pass CCS reads, take into consideration CCS analysis time • CCS analysis takes the most process time during primary analysis • Running the instrument continuously under this use case will impact operational duty cycle as data storage reaches maximum capacity – Prevents a user from performing runs until primary analysis of all runs are complete • Next version of Software will address this constraint – Move CCS into Secondary Analysis 48
  38. Productivity • An estimate of the number of active polymerases

    in a ZMW • Number varies due to diffusion or MagBead loading • Goal: prod=1 51 prod=0 prod=1 prod=2
  39. High-Quality (HQ) Regions High-Quality Regions – Region in a raw

    read with high quality base calls – High quality region is extracted from a raw read during trimming Edges of read sometimes contain uninformative data – Often caused by multiply loaded ZMWs HQ Region detector delineates useable portion of read Filtering step extracts HQ region 52 HQ Region
  40. Optimizing Loading • Overloading may lead to an increase in

    output of MB per SMRT® Cell, but … • Leads to increase in multiply loaded ZMWs • High Quality (HQ) region filtering can “rescue” some multiply loaded ZMWs, increasing total number of reads / SMRT Cell • Multi-loaded ZMWs “rescued” by HQ filtering have – Shortened read lengths – Lower accuracy compared to single-loaded ZMWs • Loading can be optimized through titration prod=0 prod=1 prod=2 53
  41. Overloading Can Lead to Delay in Start Sites, Effectively Reducing

    the Read Length 54 Higher loading shifts the distributions rightward, causing a reduction in the fraction of high-quality sequences generated at the beginning of data acquisition
  42. Overloading Increases Yield of Mapped Read, But Reduces Read Length

    and Accuracy 55 Diffusion-loaded 2 kb lambda library
  43. Data Processing Steps Involved in Filtering Primary Analysis – Adapter

    location identification – Productivity assignment – HQ Regions – Read-Quality (RQ) assignment Filtering in Secondary Analysis – Raw-read trimming using HQ region definition – Filtering using user-specified minimum RQ and minimum length 56
  44. Read Quality (RQ Value) A trained prediction of a read’s

    mapped subread accuracy Based on its pulse and base-file characteristics, such as: – Peak signal-to-noise ratio – Average base QV – Interpulse duration Used during secondary analysis filtering 57
  45. Quality Values for Single-Pass Reads Phred-like QV provided in the

    bas.h5 and FASTQ files QV = -10 * log10 (1-p) – In addition to the overall base-accuracy QV: mismatch, deletion, insertion, and merge QVs also provided. 58
  46. PacBio® Performance Summary - Throughput • Throughput per SMRT® Cell

    impacted by – Library quality and fragment distribution – Loading parameters – Chemistry and instrument parameters – Filtering criteria • For a good quality (10 kb) library, throughput per SMRT Cell typically ranges between 200 MB – 250 MB • Loading follows Poisson distribution; Overloading may increase read yield, but reduces read length and accuracy • CCS throughput lower than total per SMRT Cell throughput 59
  47. Summary • SMRT® Sequencing can achieve: – Greater than 99.999%

    (QV 50) accurate sequencing results – Average read lengths in the range of 4 – 5 kb – Yields of 200 Mb to 250 Mb per SMRT® Cell • System performance is highly tune-able, impacted by many factors – Sample quality – Insert size – Instrument run parameters (Stage Start, MagBead vs. diffusion loading, collection times, etc.) – Chemistry choice – Loading conditions • Reads are separated into subreads; CCS is the consensus of the subreads generated by sequencing a single template molecule • Overloading may increase read yield, but reduces read length and accuracy • Data processing concepts: productivity, HQ regions, read quality, QVs 61
  48. Where to Get Additional Information • PacBio® RS II Brochure

    • Perspective: Understanding Accuracy in SMRT® Sequencing • User Bulletin: SMRT® Cell Loading Recommendations • Experimental Design Trainings 62
  49. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell

    are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.