Slide 1

Slide 1 text

Lecture 30 RNA-Seq Data Analysis

Slide 2

Slide 2 text

RNA-Seq steps Traditional way: 1. Align 2. Quantify 3. Compare

Slide 3

Slide 3 text

A new style of analysis Around 2014 - new style of analysis 1. Align + Quantify (in a single step) 2. Compare

Slide 4

Slide 4 text

Fast Abundance Estimation A new school of thought that applies to transcriptomes. We may not need alignments at all. Turns RNA-Seq into a classi cation problem. Two tools: Kallisto, Salmon

Slide 5

Slide 5 text

A curious case of too much reproducibility See link in Lecure 6 for the Kallisto vs Salmon controversy of 2017. The appoaches are extremely fast and convenient. Will probably replace traditional methods (for cases when a transcriptome is available).

Slide 6

Slide 6 text

How to make sense of data Two major classes of problems: 1. Pairwise comparisons (reasonably well de ned methods) Compare two conditions: C1 vs C2 2. Non-pairwise comparisons (needs a a matching design statistical modeling) Compare more than conditions: C1 and C2 and C3

Slide 7

Slide 7 text

Pairwise comparisons You end up with a count table with conditions C1, C2 and replicates R1, R2... C1_R1 C1_R2 C2_R1 C2_R2 Transcript A 100 200 10 120 Transcript B 200 320 88 39 Transcript C 150 123 63 8 P-value meaning: When compared across replicates and conditions what is the chance of observing the variation of the size observed right now.

Slide 8

Slide 8 text

How can do different methods produce different p-values? Each p-value is de ned the same way: Which one is right? What is the typically unstated, tacit assumption? The p-value is the chance of observing the variation of the size observed right now. “ “

Slide 9

Slide 9 text

The truth about p-values The missing statement that you need to say before any de ntion is: If our model is correct then The p-value is the chance of observing the variation of the size observed right now. “ “

Slide 10

Slide 10 text

Ok. Is the statistical model ever correct?

Slide 11

Slide 11 text

Ahem. No. A model is NEVER fully correct.

Slide 12

Slide 12 text

The corrected statement The missing statement that you need to say before any de ntion is: If our statistical model were correct then Since our model is not quite correct we still hope the difference is not that substantial so the p-value will still apply to some extent. Good luck mate. The p-value would be the chance of observing the variation of the size observed right now. “ “

Slide 13

Slide 13 text

P-values are guidance. Additional evidence.

Slide 14

Slide 14 text

Scientists love to argue about methods My method is better than your method. You can perform your pairwise comparison in many ways: 1. Deseq1 2. Deseq2 3. edgeR The handbook has many detailed explanations on each and a script to do all three in parallel.

Slide 15

Slide 15 text

I ran my comparison script now what? You end up with a list of transcripts, genes, features. Most publications are about interpreting these lists of genes. Go back to Lecture 5: How do I interpret a list of genes?

Slide 16

Slide 16 text

And now for something different The Handbook is going a new direction

Slide 17

Slide 17 text

Bioinformatics Recipes We are putting scripts on the web: https://psu.bioinformatics.recipes Tip: You can nish the last homework by looking at the results of the RNA-Seq Recipe. Help us make it better Turn your project a recipe.

Slide 18

Slide 18 text

Current state We expect to change a lot.

Slide 19

Slide 19 text

What is a recipe? A "megaton" scipt with a web interface. Runnable by someone else. Shareable between users. Modifyiable, customizable. It is still a script, but a web enabled one. Borrow each others scripts, learn and create full examples.

Slide 20

Slide 20 text

The RNA-Seq Recipe

Slide 21

Slide 21 text

Bioinformatics Recipes This is the rst public announcement! Developed while teaching this course. Now joining other tools that came out from this course: Galaxy, Biostar, Biostar Handbook and now the the new baby: Bioinformatics Recipes We have very high hopes and expectations - I think this time next year the bioinformatics world will be runnning on recipes.