SMRT® Portal Protocols Resequencing RS_Resequencing_GATK Align against reference and call variants using GATK RS_Resequencing Align against reference and generate consensus RS_Resequencing_GATK_Barcode Identify barcodes, align against reference and call variants using GATK RS_Modification_Detection Align against reference and identify base modification positions RS_Modification_and_Motif_Analysis Map bacterial modifications m6A, m4C and m5C and analyze motifs RS_Minor_and_Compound_Variants Align CCS against a reference can call minor and compound variants. Assembly RS_PreAssembler_Allora Construct de novo assembly from single long insert library using HGAP method with ALLORA RS_PreAssembler Generate high quality pre-assembled long reads as a first step for use in de novo assembly (HGAP method) RS_Allora_Assembly De novo assembly using ALLORA RS_Allora_Assembly_EC Hybrid assembly using P_ErrorCorrection and ALLORA RS_AHA_Scaffolding Scaffolding assembly using AHA RS_Celera_Assembler Use pacBioToCA and Celera® Assembler to combine PacBio® CLR and CCS or short-reads Other RS_cDNA_Mapping Align splice reads against genomic reference with GMAP RS_Filter Filter to generate filtered_subreads.fastq
SMRT® Analysis Algorithms – De Novo Assembly • HGAP – de novo assembly – De novo assembly from a single long insert library preparation • Celera® Assembler – de novo assembly – Combines PacBio® long reads with short reads or CCS – Scales to plant and mammalian-sized genomes • ALLORA (“A Long Read Assembler”) – de novo assembly – Tailored to PacBio long reads and error profile – Uses overlap-layout-consensus approach – Outputs contigs as FASTA sequence and HDF5 files. • AHA (“A Hybrid Assembler”) – scaffolding of contigs – Combines PacBio sequence with high confidence contigs from an existing assembly, joining them into larger contigs – Can generate high confidence contigs from 2nd generation sequencing technologies or Sanger sequencing contigs 10
SMRT® Analysis Algorithms – Targeted Sequencing • BLASR (“Basic Local Alignment with Successive Refinement”) – reference-based alignment – Maps reads to reference genomes and sequences – Designed to handle error profile of PacBio® reads • Quiver – consensus and variant caller – Uses PacBio’s rich QVs to choose the optimal consensus – Then calls haploid SNPs and indels – Can achieve Q50 for de novo assembly and resequencing • GATK (Genome Analysis Tool Kit) variant caller – Identifies haploid and diploid SNPs using the Broad’s Unified Genotyper • GMAP (Genomic Mapping and Alignment Program) – Align splice reads against genomic reference for full length cDNA discovery – Developed at Genentech 11
Unique Computational Demands in Mapping SMRT® Sequences Mapping PacBio® data requires different tools compared to 2nd Gen: • Datasets are larger than what traditional database search methods such as BLAST handle • Read lengths are exponentially distributed and much longer than what short-read aligners are designed for • Error profile is different than what short-read aligners are designed for • Detailed alignment should benefit from rich quality values Read length histogram Accuracy by position
BLASR Combines Methods from Multiple Applications Sparse dynamic programming: rearrangements BWT-FM Index, and suffix array search for rapid mapping Detailed banded dynamic programming alignments Chaisson et al. (2012) Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application. BMC Bioinformatics 13, 238.
Map short subsequences of a read to a reference genome using a suffix array, or BWT-FM index (based on short-read mapping) Find high-scoring sets of anchors using global chaining (based on whole-genome alignment) 4 3 5* BLASR Alignment: Suffix Array/BWT Mapping + Refinement
Summary of Key Points • Secondary Analysis consists of multiple parts – SMRT® Portal is the GUI – SMRT® Pipe is the command-line script – Web Services API for automation • Protocols can be configured for multiple workflows 16
Additional Resources Available on DevNet Topic Where to look Installing SMRT® Portal SMRT Analysis Software Installation (v1.4) Running SMRT Analysis on Amazon SMRT Portal Administration SMRT Portal Help SMRT Portal Network Setup SMRT Analysis Software Installation (v1.4) PacBio® RS Network Diagram PacBio RS IT Site Prep Document Using SMRT Portal SMRT Portal Help Setting Module Parameters SMRT Pipe Reference Guide (v1.4) Troubleshoot SMRT Portal DevNet Discussion Forums PacBio Technical Support
Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.