variants using GATK RS_Resequencing Align against reference and generate consensus RS_Resequencing_GATK_Barcode Identify barcodes, align against reference and call variants using GATK RS_Modification_Detection Align against reference and identify base modification positions RS_Modification_and_Motif_Analysis Map bacterial modifications m6A, m4C and m5C and analyze motifs RS_Minor_and_Compound_Variants Align CCS against a reference can call minor and compound variants. Assembly RS_PreAssembler_Allora Construct de novo assembly from single long insert library using HGAP method with ALLORA RS_PreAssembler Generate high quality pre-assembled long reads as a first step for use in de novo assembly (HGAP method) RS_Allora_Assembly De novo assembly using ALLORA RS_Allora_Assembly_EC Hybrid assembly using P_ErrorCorrection and ALLORA RS_AHA_Scaffolding Scaffolding assembly using AHA RS_Celera_Assembler Use pacBioToCA and Celera® Assembler to combine PacBio® CLR and CCS or short-reads Other RS_cDNA_Mapping Align splice reads against genomic reference with GMAP RS_Filter Filter to generate filtered_subreads.fastq
performed Remove adaptors; filter reads, e.g. >0.75 RQ and >50 bp Align subreads to reference Generate consensus Make SNP calls Module Name P_Filter P_Mapping P_Consensus P_GATKVC Algorithm BLASR GenomicConsensus GATK Outputs filtered_subreads. fastq filtered_subreads. fasta aligned_reads. sam aligned_reads. bam aligned_reads. cmp.h5 consensus. fastq variants. gff variants. vcf
de novo assembly – De novo assembly from a single long insert library preparation • Celera® Assembler – de novo assembly – Combines PacBio® long reads with short reads or CCS – Scales to plant and mammalian-sized genomes • ALLORA (“A Long Read Assembler”) – de novo assembly – Tailored to PacBio long reads and error profile – Uses overlap-layout-consensus approach – Outputs contigs as FASTA sequence and HDF5 files. • AHA (“A Hybrid Assembler”) – scaffolding of contigs – Combines PacBio sequence with high confidence contigs from an existing assembly, joining them into larger contigs – Can generate high confidence contigs from 2nd generation sequencing technologies or Sanger sequencing contigs 10
Alignment with Successive Refinement”) – reference-based alignment – Maps reads to reference genomes and sequences – Designed to handle error profile of PacBio® reads • Quiver – consensus and variant caller – Uses PacBio’s rich QVs to choose the optimal consensus – Then calls haploid SNPs and indels – Can achieve Q50 for de novo assembly and resequencing • GATK (Genome Analysis Tool Kit) variant caller – Identifies haploid and diploid SNPs using the Broad’s Unified Genotyper • GMAP (Genomic Mapping and Alignment Program) – Align splice reads against genomic reference for full length cDNA discovery – Developed at Genentech 11
requires different tools compared to 2nd Gen: • Datasets are larger than what traditional database search methods such as BLAST handle • Read lengths are exponentially distributed and much longer than what short-read aligners are designed for • Error profile is different than what short-read aligners are designed for • Detailed alignment should benefit from rich quality values Read length histogram Accuracy by position
BWT-FM Index, and suffix array search for rapid mapping Detailed banded dynamic programming alignments Chaisson et al. (2012) Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application. BMC Bioinformatics 13, 238.
using a suffix array, or BWT-FM index (based on short-read mapping) Find high-scoring sets of anchors using global chaining (based on whole-genome alignment) 4 3 5* BLASR Alignment: Suffix Array/BWT Mapping + Refinement
SMRT® Portal SMRT Analysis Software Installation (v1.4) Running SMRT Analysis on Amazon SMRT Portal Administration SMRT Portal Help SMRT Portal Network Setup SMRT Analysis Software Installation (v1.4) PacBio® RS Network Diagram PacBio RS IT Site Prep Document Using SMRT Portal SMRT Portal Help Setting Module Parameters SMRT Pipe Reference Guide (v1.4) Troubleshoot SMRT Portal DevNet Discussion Forums PacBio Technical Support