Brassica rapa Transcriptome in the Cloud through the iPlant Collaborative and XSEDE Upendra Kumar Devisetty Postdoctoral Researcher Maloof Lab, UC Davis R500 IMB211 • Reference Transcriptome • Genome annotation R500 (oil seed cultivar) IMB211 (rapid cycling cultivar) B. rapa mapping population Research in Maloof Lab Mainly relied on in silico gene models and EST’s data from datasets (Wang et al. 2011) – In silico gene models (GENSCAN, GlimmerHMM, Fgenesh) • short exons • very long exons • non-translated exons • genes that encode non-coding RNAs accurately – EST’s • miss 20-40% of novel transcripts • transcribed only under highly specific tissue, environmental or treatment conditions • 3’ biased • short length Original Why there is a need for accurate genome annotation? • Accurate and comprehensive genome annotation (e.g. gene models) is imperative for functional studies • Useful for accurate mRNA abundance and detection of eQTLs (expression QTLs) in mapping populations Objectives • To detect transcripts that are not present in the existing genome reference of B. rapa (novel transcripts) • To update the existing gene models of B. rapa genome UK Devisetty et al. 2014 G3: Genes|Genomes|Genetics Growth Chamber, Green House, Field apical meristem R500 Library construction TRUSEQ RNA-SEQ kit (Illumina) High throughput and easy to use Sequencing 128 RNA-Seq libraries 17 lanes PE100 sequencing Illumina GAIIx 3,354 million raw paired end reads Quality control o Atmosphere and iRODS o 2,550 million quality controlled paired end reads (888 GB) Servers (iPlant Atmosphere) XX-TB Storage (iPlant Data Store and EBS) Users Now everyone can share data without sharing resources!