lengths offer a unique opportunity for transcriptome biology Methods development/optimization • Robust synthesis of full-length cDNA libraries • Sample normalization • Target enrichment • Sequencing recommendations • Paths to analysis 3 Cumulative distribution of human transcripts in 2 databases Length (nucleotides) Au et al. PLoS ONE, 2012
Britanova O, Lukyanov S, Diatchenko L, Chenchik A (1999) Amplification of cDNA ends based on template-switching effect and step-out PCR. Nucleic Acids Res. 27, 1558-1560. 5 • Clontech® SMART™ kit • Evrogen® Mint-2 kit • Uses less material • Less time consuming/fewer steps • Less stringent about selecting full-length cDNA
hr procedure Unknown yield into PCR PCR of 15-20 cycles to get DNA for SMRTbell™ template prep Zhulidov PA et al. A method for the preparation of normalized cDNA libraries enriched with full-length sequences. Bioorg Khim. 2005; 31 (2):186-94. Zhulidov PA et al. Simple cDNA normalization using kamchatka crab duplex- specific nuclease. Nucleic Acids Res. 2004; 32 (3):e37.
and Colin K. Watanabe GMAP: a genomic mapping and alignment program for mRNA and EST sequences Bioinformatics 2005 21:1859-1875 [Abstract] [Full Text] • Can be used in SMRT® Portal via RS_cDNA_Mapping protocol 20
example to follow • Native GMAP output is a SAM file • Protocol converts SAM output to PacBio® Native cmp.h5 • gff file of coverage stats, used in SMRT View visualization, can also be used with external viewers
RS and primary analysis Secondary Analysis Tertiary Analysis • Two cDNA prep methods show promising results but differ in input and stringency • Double-stranded cDNA is converted to SMRTbell™ libraries with the PacBio® Large Insert Kit • Normalization can be an effective means to increase breadth, but use caution if characterizing rare isoforms • Agilent® SureSelect® system is effective for custom enrichment of select genes • Size Selection Can Enrich for Larger cDNAs • Run design depends on size • <2k fractions: 2x45 or 2x55 min movies, C2/C2, diffusion loading, CPS start • >2k fractions: 1x120 min movies, mag loading, XL/C2, stage start • Non-size selected: run both conditions above to cover all sizes, or select for desired size range • Full-pass subreads represent putative full-length Isoforms • Transcript Reference • BLASR • BWA-SW • Genomic Reference • BLAT • GMAP • RS_cDNA_Mapping protocol • SMRT® View • Error Correction: Short Reads • pacBioToCA • P_ErrorCorrection • LSC (Au et al. PLoS ONE, 2012)
cDNA Uncovering a new 3’ UTR choice in GPM6B Au KF, Underwood JG, Lee L, Wong WH (2012) Improving PacBio Long Read Accuracy by Short Read Alignment. PLoS ONE 7(10): e46679. doi:10.1371/journal.pone.0046679
G (IgG) Antibody Repertoire • Sequence cDNA amplicons spanning the entire IgG variable regions from the IgG repertoires of four cattle • PacBio’s high accuracy combined with long reads allowed differentiation between transcripts occurring in mixed cDNA samples • Characterization of immune system diversity in antigen binding residues of complementarity determining regions (CDRs) revealed unusually long CDR3 sequences – 21-25 amino acids rather than the usual 5-6, with some as long as 62 • Hypothesize clustering of expressed sequences from controlled experiments will identify novel natural antigen binding for specific pathogens