Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling Full-Length cDNAs by SMRT Sequencing

PacBio
April 02, 2013

Profiling Full-Length cDNAs by SMRT Sequencing

PacBio

April 02, 2013
Tweet

More Decks by PacBio

Other Decks in Science

Transcript

  1. FIND MEANING IN COMPLEXITY © Copyright 2013 by Pacific Biosciences

    of California, Inc. All rights reserved. Profiling Full-Length cDNAs by SMRT® Sequencing
  2. Short Reads Fall Short in the Era of “Alternative Events”

    2 Alternative transcription start sites Alternative splicing Alternative polyA sites AAAA AAAA Genome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome Transcriptome
  3. Bringing Full-Length cDNA Sequencing to the PacBio® RS PacBio read

    lengths offer a unique opportunity for transcriptome biology Methods development/optimization • Robust synthesis of full-length cDNA libraries • Sample normalization • Target enrichment • Sequencing recommendations • Paths to analysis 3 Cumulative distribution of human transcripts in 2 databases Length (nucleotides) Au et al. PLoS ONE, 2012
  4. Full-Length cDNA Synthesis Kit from Invitrogen 4 • Very time-consuming

    • Large polyA RNA input • More stringent about full-length cDNA because of cap selection • 5 μg of polyA RNA • 50-100 ng of ds cDNA • Successfully scaled down the input to 1 μg polyA RNA
  5. “Full-Length” cDNA: Template Switching Matz M, Shagin D, Bogdanova E,

    Britanova O, Lukyanov S, Diatchenko L, Chenchik A (1999) Amplification of cDNA ends based on template-switching effect and step-out PCR. Nucleic Acids Res. 27, 1558-1560. 5 • Clontech® SMART™ kit • Evrogen® Mint-2 kit • Uses less material • Less time consuming/fewer steps • Less stringent about selecting full-length cDNA
  6. cDNA SMRTbell™ Libraries Display Size Distributions That Correlate with Expected

    Full-Length mRNA Sizes 6 S. cerevisiae H. Sapiens cerebellum 500 5k 500 5k cDNA synthesis method PCR PacBio SMRTbell™ library prep
  7. Both Kits Produce Good Results with a High Quality Sample

    7 Clontech® Human Cerebellum polyA RNA Aligned to Gencode by BLASR 5’ 3’ 5’ 3’ Invitrogen Evrogen More 3’ bias
  8. Lower Quality RNA Benefits from the 7mG Enrichment used in

    Invitrogen® Method 8 MAQC-B (Human brain cDNA) 1-2 kb cDNA fraction Clontech Invitrogen
  9. Normalization During cDNA Sample Preparation 9 1 μg input 4-7

    hr procedure Unknown yield into PCR PCR of 15-20 cycles to get DNA for SMRTbell™ template prep Zhulidov PA et al. A method for the preparation of normalized cDNA libraries enriched with full-length sequences. Bioorg Khim. 2005; 31 (2):186-94. Zhulidov PA et al. Simple cDNA normalization using kamchatka crab duplex- specific nuclease. Nucleic Acids Res. 2004; 32 (3):e37.
  10. Yeast Full-Length cDNA aligned to genome by BLASR Normalization Increases

    the Sequence Breadth of a Sample 10 Normalized Non-normalized Coverage Chr. IV Coverage Chr. VII
  11. Normalization Increases the Sequence Breadth of a Sample 11 3485

    1736 Normalized Non-Normalized 306 Number of Genes observed in a single SMRT® Cell of transcript data
  12. Subread Length Distribution H1 human stem cell cDNA: Clontech No

    size selection SMRTbell™ library size: Bioanalyzer Improving Subread Read Lengths for cDNA Libraries 12 500 5k
  13. 13 H1 SMRTbell™ library size: Bioanalyzer Subread Length Distribution 1-2

    kb fraction 2-3 kb fraction >3 kb fraction Improving Subread Lengths for cDNA Libraries
  14. Agilent® SureSelect® Enrichment to Increase the Coverage of a Subset

    of cDNAs 14 Full-Length ds_cDNA Enriched Full-Length ds_cDNA
  15. Analysis Options: Transcriptome Reference • If alignment to a transcript

    database is desired, BLASR and BWA-SW can be used for alignment 18 RefSeq/Gencode BLASR • https://github.com/PacificBiosciences/blasr BWA-SW • http://bio-bwa.sourceforge.net/
  16. Analysis Options: Genomic Reference-Based Alignment • BLAT and GMAP can

    be used to align PacBio® CLR and CCS reads to the genomic reference. 19 Genomic Reference BLAT • http://www.soe.ucsc.edu/~kent/src GMAP • http://research-pub.gene.com/gmap/
  17. GMAP: Genomic Mapping and Alignment Program • Thomas D. Wu

    and Colin K. Watanabe GMAP: a genomic mapping and alignment program for mRNA and EST sequences Bioinformatics 2005 21:1859-1875 [Abstract] [Full Text] • Can be used in SMRT® Portal via RS_cDNA_Mapping protocol 20
  18. SMRT® Portal Output Files 21 • SMRT View support –

    example to follow • Native GMAP output is a SAM file • Protocol converts SAM output to PacBio® Native cmp.h5 • gff file of coverage stats, used in SMRT View visualization, can also be used with external viewers
  19. SMRT® View 22 Reads mapping across introns Reads mapping to

    single location Gene of interest from annotation file
  20. cDNA Takeaways Sample Prep Run Design Sequencing on the PacBio®

    RS and primary analysis Secondary Analysis Tertiary Analysis • Two cDNA prep methods show promising results but differ in input and stringency • Double-stranded cDNA is converted to SMRTbell™ libraries with the PacBio® Large Insert Kit • Normalization can be an effective means to increase breadth, but use caution if characterizing rare isoforms • Agilent® SureSelect® system is effective for custom enrichment of select genes • Size Selection Can Enrich for Larger cDNAs • Run design depends on size • <2k fractions: 2x45 or 2x55 min movies, C2/C2, diffusion loading, CPS start • >2k fractions: 1x120 min movies, mag loading, XL/C2, stage start • Non-size selected: run both conditions above to cover all sizes, or select for desired size range • Full-pass subreads represent putative full-length Isoforms • Transcript Reference • BLASR • BWA-SW • Genomic Reference • BLAT • GMAP • RS_cDNA_Mapping protocol • SMRT® View • Error Correction: Short Reads • pacBioToCA • P_ErrorCorrection • LSC (Au et al. PLoS ONE, 2012)
  21. Short Reads and Genomic-Reference-Based Alignment • If short-read data is

    available, error correction can be done prior to alignment. 27 Short Read / CCS Data pacBioToCA P_ErrorCorrection LSC Genomic Reference BLAT GMAP
  22. LSC: A New Tool for Error Correction Demonstrated on Brain

    cDNA Uncovering a new 3’ UTR choice in GPM6B Au KF, Underwood JG, Lee L, Wong WH (2012) Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS ONE 7(10): e46679. doi:10.1371/journal.pone.0046679
  23. Researchers at USDA use PacBio® System to Characterize Bovine Immunoglobin

    G (IgG) Antibody Repertoire • Sequence cDNA amplicons spanning the entire IgG variable regions from the IgG repertoires of four cattle • PacBio’s high accuracy combined with long reads allowed differentiation between transcripts occurring in mixed cDNA samples • Characterization of immune system diversity in antigen binding residues of complementarity determining regions (CDRs) revealed unusually long CDR3 sequences – 21-25 amino acids rather than the usual 5-6, with some as long as 62 • Hypothesize clustering of expressed sequences from controlled experiments will identify novel natural antigen binding for specific pathogens
  24. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell

    are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.