novo assembly projects After the training, you will: • Have knowledge of standard assembly metrics • Understand the limitations of standard assembly metrics • Know what additional advanced tools are available • SMRT® Technology • PacBio® RS Workflow
of contigs – N50: Equal to the size of the contig found if you sort contigs by size and walk to the contig that represents 50% of the total sequence − N50 = 10 bp − Mean contig length = 3 bp – Max contig size • Limitation of these metrics: – They do not capture information about assembly accuracy! – Example: You can trivially concatenate all reads together and get one contig 3 10 4 1 1 1 1
genome that are incorrectly joined together • Base level errors: Result of sequencing error • Scientific goals – Can you detect the genes in which you’re interested? – Can you see relevant structural variation? 4
Colored blocks: Stretches of one or more contigs that align continuously to the reference (or “local collinear blocks” (LCB)) – Multiple blocks represent misassemblies 6 Reference: Assembly: Aaron C.E. Darling et al. 2004. Genome Research. 14(7):1394-1403.
needed—just use assembly • Tools – SMRT® Portal Resequencing and SMRT View – BWA-SW and IGV/Tablet • Tips: – Use long reads for more accurate mapping – “Polish” the assembly by getting consensus from resequencing job – If using BWA-SW, make sure that parameters are tuned for PacBio® data 8
important in assemblies: – SNP detection in assemblies – Gene prediction using open reading frames – Differentiating repeats using single-base differences • How to increase per-base accuracy? – Assembly polishing using Quiver – Available now on Github, coming to SMRT® Analysis in early 2013 11
multiple reads of a given DNA template, outputs best guess of template’s identity • QV-aware hidden Markov model to account for sequencing errors; a greedy algorithm to find the maximum likelihood template • Can achieve accuracy >Q50 (i.e. > 99.999%) using pure PacBio raw reads • Same underlying algorithm currently used for CCS generation 12
do not capture assembly quality • Consider also misassemblies, base-level accuracy, and your scientific goals • Mauve and Nucmer are tools to assess assembly quality • Quiver can be used to polish assemblies Where to Find More Information • Mauve: http://gel.ahabs.wisc.edu/mauve/ • Nucmer: http://mummer.sourceforge.net/ • Quiver: https://github.com/PacificBiosciences/GenomicConsensus/blob/master/ doc/HowToQuiver.rst