Slide 1

Slide 1 text

RNA-SEQ data analysis and DELIVERABLES from your BIOINFORMATICS CORE Stephen D. Turner, Ph.D.! UVA Bioinformatics Core Director bioinformatics.virginia.edu

Slide 2

Slide 2 text

Outline 1. UVA Bioinformatics Core 2. RNA-seq overview & data analysis 3. Deliverables 2

Slide 3

Slide 3 text

UVA Bioinformatics Core • Established Oct 2011 • Supported by UVA School of Medicine + user fees • Mission: help collaborators publish and fund their work by providing expert and timely bioinformatics consulting, data analysis, and training. 3 bioinformatics.virginia.edu

Slide 4

Slide 4 text

Services • Gene expression: - RNA-seq! - Microarrays (Affy, Illumina) - Other (Nanostring, ...) • DNA Variation: - NGS (exome, custom amplicons, WGS, …) - Array-based (GWAS, ...) • DNA Methylation: - NGS (MeDIP, RRBS, …) - Array-based (Illumina Infinium chips). • DNA Binding / ChIP-Seq • “Pathway Analysis” • Metagenomics - Microbiome studies (16S, whole-genome) - Pathogen detection/ characterization, - Phylogenetic / compositional analysis - Functional Metagenomics • Acquisition / analysis of publicly available data (GEO, dbGaP), & deposition. • Grant / Manuscript support • Custom development 4

Slide 5

Slide 5 text

Services 5

Slide 6

Slide 6 text

RNA-seq Overview 6 Prep samples Prep libraries Sequence Bioinformatics:! QA/QC, Analysis, … Step 1: Bioinformatics! Study design and planning. Consult your bioinformatician!

Slide 7

Slide 7 text

RNA-seq common question #1: Depth • Question: how much sequence do I need? • Answer: it’s complicated. • Oversimplified answer: 20-50 million PE reads / sample. • Depends on: - Size & complexity of transcriptome. - Application: differential gene expression, transcript discovery. - Tissue type, RNA quality, library preparation. - Sequencing type: length, paired-end vs single-end, etc. • Find a publication in your field with similar goals. • Good news: A fraction of a HiSeq lane good enough. 7

Slide 8

Slide 8 text

RNA-seq common question #2: sample size • Question: How many samples should I sequence? • Oversimplified Answer: Never, less than 3 biological replicates per condition. • Depends on: - Application - Goals (prioritization, biomarker discovery, etc.) - Effect size, desired power, statistical significance • Find a publication with similar goals 8

Slide 9

Slide 9 text

RNA-seq common question #3: Workflow 9 “How do I analyze the data?” Perception: ACACTCGCATCCGCACATCGCACTA GGTCAGCATACGCCGACTCCGACCG GCGCTATCGCCAGCGGAAATCGCAA Sequence Data

Slide 10

Slide 10 text

RNA-seq common question #3: Workflow 10 Eyras et al. Methods to Study Splicing from RNA-Seq. http://dx.doi.org/10.6084/m9.figshare.679993 Turner SD. RNA-seq Workflows and Tools. http://dx.doi.org/10.6084/m9.figshare.662782 Reality: a bit more complicated…

Slide 11

Slide 11 text

RNA-seq workflow #1: Differential Gene Expression 11 Turner, Stephen D. (2015): RNA-seq workflows. http://dx.doi.org/10.6084/m9.figshare.1430386

Slide 12

Slide 12 text

RNA-seq workflow #2: Differential Isoform Expression, Exon Usage 12 Turner, Stephen D. (2015): RNA-seq workflows. http://dx.doi.org/10.6084/m9.figshare.1430386

Slide 13

Slide 13 text

Beware of Pipelineitis • “Pipelines” can kill your creativity and force you to think too rigidly. • Don’t “pipeline” too early, if at all. • Does it even need to be pipeline-ified? • Who’s running it? - You, once: don’t pipeline-ify. Document, move along. - You, 2-5 times: documented script? - You, 10+ times: consider pipeline-ifying. - Others: create sharable pipeline (VM, Galaxy, containerize, makefiles, …) 13

Slide 14

Slide 14 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 14

Slide 15

Slide 15 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 15

Slide 16

Slide 16 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 16 Junction Analysis Reads Genomic Origin Alignment files for visualization

Slide 17

Slide 17 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 17 Count table for genes ctrl_1 ctrl_2 exp_1 exp_1 geneA 10 11 56 45 geneB 0 0 128 54 geneC 4205 4156 5944 4198 geneD 103 122 1 23 geneE 10 23 14 56 geneF 0 1 2 0 … … … … … TPM/FPKM table for transcripts ctrl_1 ctrl_2 exp_1 exp_1 txpA.1! 15 19 454 452 txpA.2 1 0 128 54 txpB.1 87 78 12 56 txpC.1 154 146 21 5 txpC.1 101 320 414 870 txpC.3 0 0 10 0 … … … … …

Slide 18

Slide 18 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 18

Slide 19

Slide 19 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 19

Slide 20

Slide 20 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 20

Slide 21

Slide 21 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 21

Slide 22

Slide 22 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 22 This is a stopping point for some, and a checkpoint for others. Up to this point analysis is straightforward and relatively well-characterized.

Slide 23

Slide 23 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 23 This is a stopping point for some, and a checkpoint for others. Up to this point analysis is straightforward and relatively well-characterized.

Slide 24

Slide 24 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 24

Slide 25

Slide 25 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 25 “What is the biological significance?” “How does this lead to disease?” “Make this figure.” “Write a methods/results/ discussion section.”

Slide 26

Slide 26 text

Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 26 Documentation: All methods, code, results, summaries of discussion, and billing info recorded on a wiki, accessible only to client & collaborators. Permanent Searchable Version Controlled Secure Convenient

Slide 27

Slide 27 text

Education • Bioinformatics: sub-discipline of molecular biology - Critical for molecular biologists to understand computational biology. - Same brain considers both biology and bioinformatics. • Lots of biologists approach genomics as if rules of experimental design don't matter. - N=1 - No controls or improper controls - False comfort provided by P-values, etc. 27

Slide 28

Slide 28 text

Education • Workshops / short courses: - Browsing Genes & Genomes with Ensembl - Using Galaxy for data intensive biology - Introduction to R for Life Sciences - RNA-seq data analysis bootcamp • Software Carpentry: - Software Carpentry Software Skills Bootcamp (Unix, Python, automation, version control) - Software Carpentry Instructor Training 28

Slide 29

Slide 29 text

Education 29 All courseware is open-source ! Source code: github.com/bioconnector/workshops/ ! Rendered course materials: bioconnector.org/workshops

Slide 30

Slide 30 text

Infrastructure 30 rstudio.bioconnector.virginia.edu galaxy.bioconnector.virginia.edu

Slide 31

Slide 31 text

31 Web: bioinformatics.virginia.edu E-Mail: [email protected] Blog: GettingGeneticsDone.com Twitter: @genetics_blog Facebook: facebook.com/UVABioinformaticsCore