Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MADSSCi RNAseq

MADSSCi RNAseq

This talk was given at the 2015 ABRF MADSSCi meeting. Prior to this talk, another speaker talked about the wet lab aspects of sequencing, focusing on efficiency of different prep kits and sequencing platforms.

This talk focuses on the analytical side of things, including a discussion of experimental design, primary data QC and typical analytical deliverables from a core facility.

Stephen Turner

June 04, 2015
Tweet

More Decks by Stephen Turner

Other Decks in Science

Transcript

  1. RNA-SEQ data analysis and DELIVERABLES from your BIOINFORMATICS CORE Stephen

    D. Turner, Ph.D.! UVA Bioinformatics Core Director bioinformatics.virginia.edu
  2. UVA Bioinformatics Core • Established Oct 2011 • Supported by

    UVA School of Medicine + user fees • Mission: help collaborators publish and fund their work by providing expert and timely bioinformatics consulting, data analysis, and training. 3 bioinformatics.virginia.edu
  3. Services • Gene expression: - RNA-seq! - Microarrays (Affy, Illumina)

    - Other (Nanostring, ...) • DNA Variation: - NGS (exome, custom amplicons, WGS, …) - Array-based (GWAS, ...) • DNA Methylation: - NGS (MeDIP, RRBS, …) - Array-based (Illumina Infinium chips). • DNA Binding / ChIP-Seq • “Pathway Analysis” • Metagenomics - Microbiome studies (16S, whole-genome) - Pathogen detection/ characterization, - Phylogenetic / compositional analysis - Functional Metagenomics • Acquisition / analysis of publicly available data (GEO, dbGaP), & deposition. • Grant / Manuscript support • Custom development 4
  4. RNA-seq Overview 6 Prep samples Prep libraries Sequence Bioinformatics:! QA/QC,

    Analysis, … Step 1: Bioinformatics! Study design and planning. Consult your bioinformatician!
  5. RNA-seq common question #1: Depth • Question: how much sequence

    do I need? • Answer: it’s complicated. • Oversimplified answer: 20-50 million PE reads / sample. • Depends on: - Size & complexity of transcriptome. - Application: differential gene expression, transcript discovery. - Tissue type, RNA quality, library preparation. - Sequencing type: length, paired-end vs single-end, etc. • Find a publication in your field with similar goals. • Good news: A fraction of a HiSeq lane good enough. 7
  6. RNA-seq common question #2: sample size • Question: How many

    samples should I sequence? • Oversimplified Answer: Never, less than 3 biological replicates per condition. • Depends on: - Application - Goals (prioritization, biomarker discovery, etc.) - Effect size, desired power, statistical significance • Find a publication with similar goals 8
  7. RNA-seq common question #3: Workflow 9 “How do I analyze

    the data?” Perception: ACACTCGCATCCGCACATCGCACTA GGTCAGCATACGCCGACTCCGACCG GCGCTATCGCCAGCGGAAATCGCAA Sequence Data
  8. RNA-seq common question #3: Workflow 10 Eyras et al. Methods

    to Study Splicing from RNA-Seq. http://dx.doi.org/10.6084/m9.figshare.679993 Turner SD. RNA-seq Workflows and Tools. http://dx.doi.org/10.6084/m9.figshare.662782 Reality: a bit more complicated…
  9. RNA-seq workflow #1: Differential Gene Expression 11 Turner, Stephen D.

    (2015): RNA-seq workflows. http://dx.doi.org/10.6084/m9.figshare.1430386
  10. RNA-seq workflow #2: Differential Isoform Expression, Exon Usage 12 Turner,

    Stephen D. (2015): RNA-seq workflows. http://dx.doi.org/10.6084/m9.figshare.1430386
  11. Beware of Pipelineitis • “Pipelines” can kill your creativity and

    force you to think too rigidly. • Don’t “pipeline” too early, if at all. • Does it even need to be pipeline-ified? • Who’s running it? - You, once: don’t pipeline-ify. Document, move along. - You, 2-5 times: documented script? - You, 10+ times: consider pipeline-ifying. - Others: create sharable pipeline (VM, Galaxy, containerize, makefiles, …) 13
  12. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 14
  13. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 15
  14. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 16 Junction Analysis Reads Genomic Origin Alignment files for visualization
  15. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 17 Count table for genes ctrl_1 ctrl_2 exp_1 exp_1 geneA 10 11 56 45 geneB 0 0 128 54 geneC 4205 4156 5944 4198 geneD 103 122 1 23 geneE 10 23 14 56 geneF 0 1 2 0 … … … … … TPM/FPKM table for transcripts ctrl_1 ctrl_2 exp_1 exp_1 txpA.1! 15 19 454 452 txpA.2 1 0 128 54 txpB.1 87 78 12 56 txpC.1 154 146 21 5 txpC.1 101 320 414 870 txpC.3 0 0 10 0 … … … … …
  16. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 18
  17. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 19
  18. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 20
  19. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 21
  20. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 22 This is a stopping point for some, and a checkpoint for others. Up to this point analysis is straightforward and relatively well-characterized.
  21. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 23 This is a stopping point for some, and a checkpoint for others. Up to this point analysis is straightforward and relatively well-characterized.
  22. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 24
  23. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 25 “What is the biological significance?” “How does this lead to disease?” “Make this figure.” “Write a methods/results/ discussion section.”
  24. Specific Deliverables 1. Sequence data QA/QC! 2. Alignment! 2.1. Alignment

    Files 2.2. Alignment Statistics 2.3. Alignment QA 3. Quantification! 3.1. Genes: Raw+normalized counts of reads mapping to genes 3.2. Transcripts: FPKM, TPM, etc. 4. Expression QA/QC! 4.1. PCA 4.2. Sample distance heatmap 4.3. Hierarchical clustering 5. Differential Expression! 5.1. Genes: list of DE Genes, Fold changes, p- values, … 5.2. Transcripts: isoforms, splicing, TSS usage, coding output, … 5.3. Visualizations: heatmaps, volcano plots, MA- plots, bar plots, … 6. Pathway Analysis! 6.1. Gene ontology enrichment 6.2. Gene Set Enrichment Analysis 6.3. Other pathway analysis (impact analysis, regulatory network analysis, …) 7. Biological Interpretation, custom analysis! 8. Manuscript / Grant Preparation 26 Documentation: All methods, code, results, summaries of discussion, and billing info recorded on a wiki, accessible only to client & collaborators. Permanent Searchable Version Controlled Secure Convenient
  25. Education • Bioinformatics: sub-discipline of molecular biology - Critical for

    molecular biologists to understand computational biology. - Same brain considers both biology and bioinformatics. • Lots of biologists approach genomics as if rules of experimental design don't matter. - N=1 - No controls or improper controls - False comfort provided by P-values, etc. 27
  26. Education • Workshops / short courses: - Browsing Genes &

    Genomes with Ensembl - Using Galaxy for data intensive biology - Introduction to R for Life Sciences - RNA-seq data analysis bootcamp • Software Carpentry: - Software Carpentry Software Skills Bootcamp (Unix, Python, automation, version control) - Software Carpentry Instructor Training 28