Track 1: Amplicon Sequencing

F11d4ddd9ca7e190fdabf0cda3f7ae29?s=47 PacBio
October 15, 2014

Track 1: Amplicon Sequencing

F11d4ddd9ca7e190fdabf0cda3f7ae29?s=128

PacBio

October 15, 2014
Tweet

Transcript

  1. FIND MEANING IN COMPLEXITY © Copyright 2014 by Pacific Biosciences

    of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures.. Nicole Rapicavoli Field Applications Scientist October 2014 Barcoding for Amplicon Sequencing Using SMRT® Analysis V2.3
  2. Learning Objectives 2 Scientists • Interested in multiplexing amplicons for

    full-length sequencing. After the training, you will be able to • Be able to choose the best multiplexing strategy for your experimental design. • Understand the Long Amplicon Analysis method. • Run a Long Amplicon Analysis job and understand the reports generated in SMRT® Portal. • SMRT® Technology • PacBio® System Workflow • General Understanding of SMRT Portal
  3. Targeted Sequencing: High-Resolution Insights Exquisite sensitivity and specificity to fully

    characterize genetic complexity – Multi-kilobase reads – Achieves 99.999% consensus accuracy – Linear variant detection to <0.1% frequency – Access to the entire genome SNP Detection and Validation Repeat Expansions Compound Mutations and Haplotype Phasing Minor-variant Detection www.pacb.com/target Iso-Seq™ Full-Length Transcript Sequencing
  4. Benefits of PacBio® System for Targeted Sequencing Applications • SMRT®

    Sequencing can achieve greater than 99.999% (QV 50) accurate sequencing results for targeted sequencing applications – Near-perfect consensus accuracy – Best coverage uniformity with no amplification and minimal GC bias – Improved mappability with longer average read lengths • Flexibility in amplicon size allow for customized solutions for complex SNP detection applications • Sequencing of long amplicons with multi-kilobase read lengths provides direct strand-specific haplotype phasing • Single-molecule sequencing detects low-frequency minor variants at high- resolution 4
  5. Diverse Range of Structural Variation Detection 1 bp 10 bp

    100 bp 1 kb 10 kb 100 kb 1 Mb 10 Mb 100 Mb Size of Variant (bp) Variant Type SNPs SNP Phasing Small In/dels Small Insertions / Deletions Phasing STRs Repeat Expansion Fine-scale SVs VNTR and Other Structural Changes Retro-element Insertions LINE1 Elements Splice Variants Alternative Splicing Intermediate SVs Tandem Repeats, Duplications, Inversions Large SVs Haplotype-level Changes Chromosomal SVs Chromosomal Structural Rearrangement Validation One PacBio® Read Spans Region Discovery or validation of many types of structural variation
  6. Phasing of Accurate SNPs at Multi-kilobase Distances • Generate long

    haploid reads • Detect novel alleles • Accurate genotyping with phase information
  7. Amplicon Experimental Design Recommendations

  8. Barcoding Background 8 Insert Barcode Barcode Short Insert Polymerase will

    go around multiple times; multiple opportunities to view barcode Long Insert Few polymerases may make >1 pass; many polymerases may not see first barcode (or second one)
  9. Barcode Scoring Modes • Symmetric Mode: – Barcode sequences are

    the same on both sides of the insert. – Recommended for inserts longer than 3 kb. • Paired Mode: (aka Paired/Asymmetric) – Different barcode sequences on either end of the insert. – Only recommended for sequences shorter than 3 kb. – Not yet validated. 9
  10. Barcoding Solution Works Well for Short Inserts (tested to 6

    kb) Barcode During Amplification Barcode After Amplification/Fragmentation 450, 16-bp barcodes can be synthesized into primers Forward Primer Forward Barcode Reverse Primer Reverse Barcode 12 adapters with 7-bp barcodes in the stems Barcode Adapter PCR
  11. Examples of Applications Depend on Multiplexing 11 Barcoding Method Inserts

    <5 kb Inserts ≥5 kb Barcoding During Amplification (Primers) • Amplicon Panels • HLA Class I Genes • Clone Validation • Targeted Viral • HLA Class II Genes • Clone Validation • Full-length HIV • Structural Variants Barcoding After Amplification or Fragmentation (Adapters) • Amplicon Panels • Targeted Viral • Not recommended. • Instead pool non- overlapping and assemble in HGAP:  BACs  Fosmids Key Questions • How do you incorporate the barcode? • How do you optimize the number of barcodes you see?
  12. Barcoding Examples – Amplicons

  13. Barcoded Amplicon Use Case – Enzyme Engineering Design Cycle Detailed

    studies on interesting mutants Screening & data interpretation High-throughput cloning & purification Formulate design hypotheses 13
  14. Project Overview: Validating 384 Individual Clones 1.7 kb in Length

    14 Goal • To compare a Sanger workflow with a PacBio® workflow (including barcoding) for clone validation Scope • Validating 384 distinct clones, each 1.7 kb in length, with high sequence homology (~99%) Project Design • Sanger: Five amplicons of ~750 bp for each clone • PacBio: One amplicon of 1.7 kb for each clone Results • Sequenced all clones with 100% accuracy using one barcoded library and one SMRT® Cell
  15. Overview of PacBio’s SMRT® Sequencing Method Pool Barcoded Amplicons Prepare

    SMRTbell™ Library Sequencing on 1 SMRT Cell Analysis Primer Design Vector Clone #1 (1,700 bp) Vector Universal Primer Barcode #1 Universal Primer Barcode #1 Clone #2 (1,700 bp) Universal Primer Barcode #2 Universal Primer Barcode #2 X 384 Clones
  16. Sequencing & Informatics Workflow 16 Each SMRT® Cell Will Generate

    ~50,000 Barcoded Sequences Barcode 2 Sequences from Each Bin Are Aligned and Bases Are Called ~100x coverage per clone • Q50 accuracy at ~30x coverage Single Fasta file at ≥Q50 Barcode 3 Barcode 1 Barcodes Are Identified and Sequences Are Binned (384 Bins) Barcode 2
  17. Coverage Required for Accurate Sequencing 17 • Assemblies performed with

    subsets of data at differing levels of coverage • At 45X coverage, errors detected with a frequency of 10-5 • Above 50X coverage, no errors detected in ~700 kb of sequence 6.4 X 10-5 2.1 X 10-5 Error Rates
  18. Coverage Levels in Dataset 18 100X coverage 200X coverage 385X

    mean coverage Per-base Coverage by Barcode Barcode Number Coverage Rank Sorted Coverage Levels Simple pooling of PCR products produced >100X coverage for all 384 clones in a single run. 50X coverage
  19. PacBio – No Assembly Required! 6kb Vector 1kb insert Vector

    Complexity of Sanger Assembly Scales with Insert Size Sanger Assembly 6kb Vector 1kb insert Vector 19
  20. Sanger vs PacBio® Sequencing 20 384 Plasmids 384 x 5

    Sequencing Reactions 1,920 reactions Sanger Sequencing Vector Gene Vector 384 PCR Reactions 1 template prep & 1 SMRT® Cell PacBio Sequencing 1,700 bp Vector Gene Vector 700-850 bp
  21. Process for Long-Amplicon Analysis for Barcoded Samples Overlap Post-Processing Filters

    Separate By Barcode Quiver Consensus Sequence
  22. Barcoding Examples – HLA

  23. Allele 1 Allele 2 Advantages to Using SMRT® Sequencing for

    HLA Typing Sanger Sequencing Cis/Trans SNP phasing ambiguity due to non-clonal sequencing ? ?
  24. Allele 1 Allele 2 Advantages to Using SMRT® Sequencing for

    HLA Typing High-throughput, but short reads miss phasing information AC or GT? Short-Read Sequencing
  25. Challenges of Phasing SNPs with Short-Read Assembly • Assembly of

    short reads is a labor-intensive informatics process • SNP-poor regions are difficult to phase resulting in allele ambiguity Allele 1 Allele 2 SNP rich region SNP poor region SNP rich region ??? ??? Unknown
  26. Allele 1 Allele 2 Advantages to Using SMRT® Sequencing for

    HLA Typing PacBio technology avoids all of these problems PacBio® Sequencing
  27. PacBio’s Solution: Fully-phased, Allele-specific HLA Sequencing Example shown from collaboration

    with Stanford Genome Technology Center, allele pair from cell line 280599 Full-length, continuous long reads covering the heterozygous HLA-A gene
  28. Allele-level Genotyping: Phasing Across Exons and Introns Single SMRT® Cell

    provides full information (all exons + all introns) for allele-level genotyping along with SNP phasing information
  29. Phase Information for Full-length HLA Class I and II Genes

    2013 ASHG Poster: Allele-Level Sequencing and Phasing of Full-length HLA Class I and II Genes HLA-A HLA-B HLA-C HLA-DRB1 Sample ID Allele1 Allele2 Allele1 Allele2 Allele1 Allele2 Allele1 Allele2 TU01 A*02:06:01 A*11:01:01 B*40:02:01 B*55:02:01:02 C*01:02:01 C*03:03:01 DRB*09:01:02:01/02 DRB*15:01:01:03 TU02 A*02:01:01:01 A*31:01:02 B*51:02:01 B*56:01:01:02 C*01:02:01 C*03:04:01:02 DRB*09:01:02:02 DRB*14:05:01:02 TU03 A*24:02:01:01 A*31:01:02 B*07:02:01 B*35:01:01:02 C*03:03:01 C*07:02:01:03 DRB*01:01:01 DRB*14:05:01:02 TU04 A*02:06:01 A*02:07:01 B*40:02:01 B*44:03:01 C*03:03:01 C*14:03 DRB*04:10:03:01 DRB*14:54:01:02 TU05 A*26:01:01 A*31:01:02 B*15:01:01:01 B*35:01:01:02 C*03:04:01:02 C*07:02:01:04 DRB*09:01:02:01 DRB*13:02:01:02 TU06 A*26:03:01 A*33:03:01 B*15:11:01 B*44:03:01 C*03:03:01 C*14:03 DRB*04:05:01:01 DRB*13:02:01:02 TU07 A*02:03:01 A*24:02:01:01 B*38:02:01 B*54:01:01 C*01:02:01 C*07:02:01:05 DRB*04:03:01:02 DRB*08:03:02:02 TU08 A*24:02:01:01 A*33:03:01 B*44:03:01 B*48:01:01 C*08:03:01 C*14:03 DRB*13:02:01:02 DRB*16:02:01:02 TU09 A*02:01:01:01 A*02:06:01 B*40:06:01:01 B*48:01:01 C*08:01:01 C*15:02:01 DRB*14:05:01:02 - TU10 A*11:01:01 A*31:01:02 B*40:01:02 B*51:01:01 C*07:02:01:01 C*15:02:01 DRB*09:01:02:01 DRB*12:01:01:02 TU21 A*03:02:01 A*24:02:01:01 B*07:02:01 B*13:02:01 C*06:02:01:01 C*07:02:01:03 DRB*01:01:01 DRB*07:01:01:01 Example PacBio® Result - HLA class I (A, B and C) and class II (DRB1) genes showed: • 100% concordance with cDNA reference • One mismatch in intron 2 of TU04 versus SS-SBT generated reference • Resolved allele ambiguities from PCR-SSO typing when compared to Tokai University Reference Database
  30. PacBio® Reads Provide Evidence to Identify New Allele Types •

    Four novel HLA alleles were identified using PacBio sequence data and have subsequently been submitted to the IMGT/HLA database: – A*68:01:02:02 – B*52:01:01:03 – C*02:02:02:02 – C*08:02:01:02 • One HLA B allele was corrected in the IMGT/HLA database: – B*27:05:02 Upcoming presentation by Collaborator at EFI 2014
  31. Allele-level HLA Typing Using SMRT® Sequencing: Multiplex Option 48 x

    3 Full-Length HLA Class I Genes Each SMRT® Cell generates ~50,000 barcoded sequences Barcode 2 Sequences from each bin are clustered by gene type & allele; Consensus sequences are generated ~100x coverage per allele Fasta files per allele at ≥Q50 Barcode 3 Barcode 1 Barcodes are identified; Sequences are binned (48 Bins) Barcode 2 Sequence run time = 2 hours
  32. PacBio® Sequencing Provides Most Value (48 samples x 3 HLA

    class I genes = 144 PCR reactions) 144 amplicons X 6 seq. primers = 864 seq. rxn Sanger Sequencing 144 pooled amp. = 1 seq. rxn 1 template prep. & 1 SMRT® Cell PacBio Sequencing Sample throughput per week: ~900 Sample throughput per week: ~2,600 3,500 bp Exon 2, 3& 4 only Fully-phased allele-level GGAGAAGAGGGATCAGGACGAAGTCCCAGGCCCCGGGCGGGGCTCTCAGGGTCTCAGGCTCCGAGAGCCTTGTCTGCATTGGGGAGGCGCAGCGTTGGGGATTCCCCACTCCCACGAGTTTCACTTCTTCTCCCA ACCTATGTCGGGTCCTTCTTCCAGGATACTCGTGACGCGTCCCCATTTCCCACTCCCATTGGGTGTCGGATATCTAGAGAAGCCAATCAGTGTCGCCGGGGTCCCAGTTCTAAAGTCCCCACGCACCCACCCGGACT CAGAATCTCCTCAGACGCCGAGATGCGGGTCACGGCGCCCCGAACCGTCCTCCTGCTGCTCTGGGGGGCAGTGGCCCTGACCGAGACCTGGGCCGGTGAGTGCGGGGTCGGGAGGGAAATGGCCTCTGTGGGG AGGAGCGAGGGGACCGCAGGCGGGGGCGCAGGACCTGAGGAGCCGCGCCGGGAGGAGGGTCGGGCGGGTCTCAGCCCCTCCTCGCCCCCAGGCTCCCACTCCATGAGGTATTTCTACACCGCCATGTCCCGG CCCGGCCGCGGGGAGCCCCGCTTCATCGCAGTGGGCTACGTGGACGACACCCAGTTCGTGAGGTTCGACAGCGACGCCGCGAGTCCGAGGACGGAGCCCCGGGCGCCATGGATAGAGCAGGAGGG GCCGGAG TATTGGGACCGGGAGACACAGATCTCCAAGACCAACACACAGACTTACCGAGAGAACCTGCGGATCGCGCTCCGCTACTACAACCAGAGCGAGGCCGGTGAGTGACCCCGGCCCGGGGCGCAGGTCACGACTCC CCATCCCCCACGTACGGCCCGGGTCGCCCCGAGTCTCCGGGTCCGAGATCCGCCTCCCTGAGGCCGCGGGACCCGCCCAGACCCTCGACCGGCGAGAGCCCCAGGCGCGTTTACCCGGTTTCATTTTCAGTTGA GGCCAAAATCCCCGCGGGTTGGTCGGGGCGGGGCGGGGCTCGGGGGACGGTGCTGACCGCGGGGCCGGGGCCAGGGTCTCACACTTGGCAGACGATGTATGGCTGCGACGTGGGGTCGGACGGGCGCCTCC TCCGCGGGCATAACCAGTACGCCTACGACGGCAAAGATTACATCGCCCTGAACGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCAGCGCAAGTGGGAGGCGGCCCGTGAGGCGGA GCAGCTGAGAGCCTACCTGGAGGGCCTGTGCGTGGAGTGGCTCCGCAGACACCTGGAGAACGGGAAGGAGACGCTGCAGCGCGCGGGTACCAGGGGCAGTGGGGAGCCTTCCCCATCTCCTATAG GTCGCCGG GGATGGCCTCCCACGAGAAGAGGAGGAAAATGGGATCAGCGCTAGAATGTCGCCCTCCCTTGAATGGAGAATGGCATGAGTTTTCCTGAGTTTCCTCTGAGGGCCCCCTCTTCTCTCTAGGACAATTAAGGGATGAC GTCTCTGAGGAAATGGAGGGGAAGACAGTCCCTAGAATACTGATCAGGGGTCCCCTTTGACCCCTGCAGCAGCCTTGGGAACCGTGACTTTTCCTCTCAGGCCTTGTTCTCTGCCTCACACTCAGTGTGTTTGGGGC TCTGATTCCAGCACTTCTGAGTCACTTTACCTCCACTCAGATCAGGAGCAGAAGTCCCTGTTCCCCGCTCAGAGACTCGAACTTTCCAATGAATAGGAGATTATCCCAGGTGCCTGCGTCCAGGCTGGTGTCTGGGTT CTGTGCCCCTTCCCCACACCAGGTGTCCTGTCCATTCTCAGGCTGGTCACATGGGTGGTCCTAGGGTGTCCCATGAGAGATGCAAAGCGCCTGAATTTTCTGACTCTTCCCATCAGACCCCCCAAAGACACACGTGA CCCACCACCCCGTCTCTGACCATGAGGCCACCCTGAGGTGCTGGGCCCTGGGCTTCTACCCTGCGGAGATCACACTGACCTGGCAGCGGGATGGCGAGGACCAAACTCAGGACACTGAGCTTGTG GAGACCAGAC CAGCAGGAGATAGAACCTTCCAGAAGTGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACATGCCATGTACAGCATGAGGGGCTGCCGAAGCCCCTCACCCTGAGATGGGGTAAGGAGGGGGATG AGGGGTCATATCTCTTCTCAGGGAAAGCAGGAGCCCTTCTGGAGCCCTTCAGCAGGGTCAGGGCCCCTCGTCTTCCCCTCCTTTCCCAGAGCCATCTTCCCAGTCCACCATCCCCATCGTGGGCATTGTTGCTGGC CTGGCTGTCCTAGCAGTTGTGGTCATCGGAGCTGTGGTCGCTACTGTGATGTGTAGGAGGAAGAGCTCAGGTAGGGAAGGGGTGAGGGGTGGGGTCTGGGTTTTCTTGTCCCACTGGGGGTTTCAAGCCCCAGGTA GAAGTGTTCCCTGCCTCATTACTGGGAAGCAGCATCCACACAGGGGCTAACGCAGCCTGGGACCCTGTGTGCCAGCACTTACTCTTTTGTGCAGCACATGTGACAATGAAGGACGGATGTATCACCTTGATGGTTGT GGTGTTGGGGTCCTGATTTCAGCATTCATGAGTCAGGGGAAGGTCCCTGCTAAGGACAGACCTTAGGAGGGCAGTTGGTCCAGGACCCACACTTGCTTTCCTCGTGTTTCCTGATCCTGCCTTGG GTCTGTAGTCAT ACTTCTGGAAATTCCTTTTGGGTCCAAGACGAGGAGGTTCCTCTAAGATCTCATGGCCCTGCTTCCTCCCAGTCCCCTCACAGGACATTTTCTTCCCACAGGTGGAAAAGGAGGGAGCTACTCTCAGGCTGCGTGTAA GTGGTGGGGGTGGGAGTGTGGAGGAGCTCACCCACCCCATAATTCCTCCTGTCCCACGTCTCCTGCGGGCTCTGACCAGGTCCTGTTTTTGTTCTACTCCAGCCAGCGACAGTGCCCAGGGCTCTGATGTGTCTCT CACAGCTTGAAAAGGTGAGATTCTTGGGGTCTAGAGTGGGCGGGGGGGGCGGGGAGGGGGCAGAGGGGAAAGGCCTGGGTAATGGAGATTCTTTGATTGGGATGTTTCGCGTGTGTCGTGGGCTGTTCAGAGTGT CATCACTTACCATGACTAACCAGAATTTGTTCATGACTGTTGTTTTCTGTAGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACTCCTCCCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTC TGCAAAGGCACCTGAATGTGTCTGCGTCCCTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTGTTCCCATGCTGACCTG GGAGAAGAGGGATCAGGACGAAGTCCCAGGCCCCGGGCGGGGCTCTCAGGGTCTCAGGCTCCGAGAGCCTTGTCTGCATTGGGGAGGCGCAGCGTTGGGGATTCCCCACTCCCACGAGTTTCACTTCTTCTCCCA ACCTATGTCGGGTCCTTCTTCCAGGATACTCGTGACGCGTCCCCATTTCCCACTCCCATTGGGTGTCGGATATCTAGAGAAGCCAATCAGTGTCGCCGGGGTCCCAGTTCTAAAGTCCCCACGCACCCACCCGGACT CAGAATCTCCTCAGACGCCGAGATGCGGGTCACGGCGCCCCGAACCGTCCTCCTGCTGCTCTGGGGGGCAGTGGCCCTGACCGAGACCTGGGCCGGTGAGTGCGGGGTCGGGAGGGAAATGGCCTCTGTGGGG AGGAGCGAGGGGACCGCAGGCGGGGGCGCAGGACCTGAGGAGCCGCGCCGGGAGGAGGGTCGGGCGGGTCTCAGCCCCTCCTCGCCCCCAGGCTCCCACTCCATGAGGTATTTCTACACCGCCATGTCCCGG CCCGGCCGCGGGGAGCCCCGCTTCATCGCAGTGGGCTACGTGGACGACACCCAGTTCGTGAGGTTCGACAGCGACGCCGCGAGTCCGAGGACGGAGCCCCGGGCGCCATGGATAGAGCAGGAGGG GCCGGAG TATTGGGACCGGGAGACACAGATCTCCAAGACCAACACACAGACTTACCGAGAGAACCTGCGGATCGCGCTCCGCTACTACAACCAGAGCGAGGCCGGTGAGTGACCCCGGCCCGGGGCGCAGGTCACGACTCC CCATCCCCCACGTACGGCCCGGGTCGCCCCGAGTCTCCGGGTCCGAGATCCGCCTCCCTGAGGCCGCGGGACCCGCCCAGACCCTCGACCGGCGAGAGCCCCAGGCGCGTTTACCCGGTTTCATTTTCAGTTGA GGCCAAAATCCCCGCGGGTTGGTCGGGGCGGGGCGGGGCTCGGGGGACGGTGCTGACCGCGGGGCCGGGGCCAGGGTCTCACACTTGGCAGACGATGTATGGCTGCGACGTGGGGTCGGACGGGCGCCTCC TCCGCGGGCATAACCAGTACGCCTACGACGGCAAAGATTACATCGCCCTGAACGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCAGCGCAAGTGGGAGGCGGCCCGTGAGGCGGA GCAGCTGAGAGCCTACCTGGAGGGCCTGTGCGTGGAGTGGCTCCGCAGACACCTGGAGAACGGGAAGGAGACGCTGCAGCGCGCGGGTACCAGGGGCAGTGGGGAGCCTTCCCCATCTCCTATAG GTCGCCGG GGATGGCCTCCCACGAGAAGAGGAGGAAAATGGGATCAGCGCTAGAATGTCGCCCTCCCTTGAATGGAGAATGGCATGAGTTTTCCTGAGTTTCCTCTGAGGGCCCCCTCTTCTCTCTAGGACAATTAAGGGATGAC GTCTCTGAGGAAATGGAGGGGAAGACAGTCCCTAGAATACTGATCAGGGGTCCCCTTTGACCCCTGCAGCAGCCTTGGGAACCGTGACTTTTCCTCTCAGGCCTTGTTCTCTGCCTCACACTCAGTGTGTTTGGGGC TCTGATTCCAGCACTTCTGAGTCACTTTACCTCCACTCAGATCAGGAGCAGAAGTCCCTGTTCCCCGCTCAGAGACTCGAACTTTCCAATGAATAGGAGATTATCCCAGGTGCCTGCGTCCAGGCTGGTGTCTGGGTT CTGTGCCCCTTCCCCACACCAGGTGTCCTGTCCATTCTCAGGCTGGTCACATGGGTGGTCCTAGGGTGTCCCATGAGAGATGCAAAGCGCCTGAATTTTCTGACTCTTCCCATCAGACCCCCCAAAGACACACGTGA CCCACCACCCCGTCTCTGACCATGAGGCCACCCTGAGGTGCTGGGCCCTGGGCTTCTACCCTGCGGAGATCACACTGACCTGGCAGCGGGATGGCGAGGACCAAACTCAGGACACTGAGCTTGTG GAGACCAGAC CAGCAGGAGATAGAACCTTCCAGAAGTGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACATGCCATGTACAGCATGAGGGGCTGCCGAAGCCCCTCACCCTGAGATGGGGTAAGGAGGGGGATG AGGGGTCATATCTCTTCTCAGGGAAAGCAGGAGCCCTTCTGGAGCCCTTCAGCAGGGTCAGGGCCCCTCGTCTTCCCCTCCTTTCCCAGAGCCATCTTCCCAGTCCACCATCCCCATCGTGGGCATTGTTGCTGGC CTGGCTGTCCTAGCAGTTGTGGTCATCGGAGCTGTGGTCGCTACTGTGATGTGTAGGAGGAAGAGCTCAGGTAGGGAAGGGGTGAGGGGTGGGGTCTGGGTTTTCTTGTCCCACTGGGGGTTTCAAGCCCCAGGTA GAAGTGTTCCCTGCCTCATTACTGGGAAGCAGCATCCACACAGGGGCTAACGCAGCCTGGGACCCTGTGTGCCAGCACTTACTCTTTTGTGCAGCACATGTGACAATGAAGGACGGATGTATCACCTTGATGGTTGT GGTGTTGGGGTCCTGATTTCAGCATTCATGAGTCAGGGGAAGGTCCCTGCTAAGGACAGACCTTAGGAGGGCAGTTGGTCCAGGACCCACACTTGCTTTCCTCGTGTTTCCTGATCCTGCCTTGG GTCTGTAGTCAT ACTTCTGGAAATTCCTTTTGGGTCCAAGACGAGGAGGTTCCTCTAAGATCTCATGGCCCTGCTTCCTCCCAGTCCCCTCACAGGACATTTTCTTCCCACAGGTGGAAAAGGAGGGAGCTACTCTCAGGCTGCGTGTAA GTGGTGGGGGTGGGAGTGTGGAGGAGCTCACCCACCCCATAATTCCTCCTGTCCCACGTCTCCTGCGGGCTCTGACCAGGTCCTGTTTTTGTTCTACTCCAGCCAGCGACAGTGCCCAGGGCTCTGATGTGTCTCT CACAGCTTGAAAAGGTGAGATTCTTGGGGTCTAGAGTGGGCGGGGGGGGCGGGGAGGGGGCAGAGGGGAAAGGCCTGGGTAATGGAGATTCTTTGATTGGGATGTTTCGCGTGTGTCGTGGGCTGTTCAGAGTGT CATCACTTACCATGACTAACCAGAATTTGTTCATGACTGTTGTTTTCTGTAGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACTCCTCCCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTC TGCAAAGGCACCTGAATGTGTCTGCGTCCCTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTGTTCCCATGCTGACCTG Allele 1 Allele 2 Image from GenDx SBTengine™ Instruction Manual
  33. Process for Long-Amplicon Analysis Overlap Cluster Quiver Quiver Phasing Post-Processing

    Filters Optionally Separate by Barcode Haplotype 1 Haplotype 2
  34. Long Amplicon Analysis Walk-Through

  35. Long Amplicon Analysis Use Cases LAA Mode Cluster? Phase? Example

    Multiple gene, multiple phases X X HLA Single gene, multiple phases X Human amplicon with phasing Single gene, single phase Clone validation 35 All use cases allow barcoding HLA Analysis Type Class I Class II Note Just HLA Class I X Just HLA Class II X Combined HLA Class I and Class II X X Supported in SMRT® Analysis 2.3
  36. Create New Long Amplicon Analysis Job in SMRT® Portal 2.3

  37. Create New Long Amplicon Analysis Job in SMRT® Portal 2.3

  38. Create New Long Amplicon Analysis Job in SMRT® Portal 2.3

  39. LAA Protocol Parameters in SMRT® Portal 2.3 Activate barcodes 

  40. LAA Barcode Parameters in SMRT® Portal 2.3 My library has

    DNA Barcodes that are: - Symmetric in most cases. Minimum Barcode score: - Maximum score is 2x(length barcode), which in this case is 2x16=32. - For 16 bp barcodes, a minimum score of 22 results in less than 1% false positive scores. Barcode FASTA file: - Enter the location of your barcode file here. - Default is PacBio set of 384 barcodes. 
  41. LAA Amplicon Parameters in SMRT® Portal 2.3 Minimum subread length:

    - Set to 80% of your insert size Coarse Cluster Subreads by Gene Family: - Keep clicked for HLA - Unclick for amplicon consensus calling. Maximum number of subreads: - Set to 200  Phase Alleles: - Keep clicked for HLA or other applications where you expect 2 alleles. - Unclick for amplicon with a single allele.
  42. Process for Long-Amplicon Analysis Overlap Cluster Quiver Quiver Phasing Post-Processing

    Filters Optionally Separate by Barcode Haplotype 1 Haplotype 2
  43. LAA Output - Overview 43

  44. Bioinformatics Workflow 44 Output is a multiFASTA file – One

    Consensus Sequence per Barcode
  45. For Research Use Only. Not for use in diagnostic procedures.

    Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, and Iso-Seq are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.