Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lecture 30: RNA-Seq data analysis

Istvan Albert
November 20, 2017

Lecture 30: RNA-Seq data analysis

RNA-Seq data analysis

Istvan Albert

November 20, 2017
Tweet

More Decks by Istvan Albert

Other Decks in Science

Transcript

  1. A new style of analysis Around 2014 - new style

    of analysis 1. Align + Quantify (in a single step) 2. Compare
  2. Fast Abundance Estimation A new school of thought that applies

    to transcriptomes. We may not need alignments at all. Turns RNA-Seq into a classi cation problem. Two tools: Kallisto, Salmon
  3. A curious case of too much reproducibility See link in

    Lecure 6 for the Kallisto vs Salmon controversy of 2017. The appoaches are extremely fast and convenient. Will probably replace traditional methods (for cases when a transcriptome is available).
  4. How to make sense of data Two major classes of

    problems: 1. Pairwise comparisons (reasonably well de ned methods) Compare two conditions: C1 vs C2 2. Non-pairwise comparisons (needs a a matching design statistical modeling) Compare more than conditions: C1 and C2 and C3
  5. Pairwise comparisons You end up with a count table with

    conditions C1, C2 and replicates R1, R2... C1_R1 C1_R2 C2_R1 C2_R2 Transcript A 100 200 10 120 Transcript B 200 320 88 39 Transcript C 150 123 63 8 P-value meaning: When compared across replicates and conditions what is the chance of observing the variation of the size observed right now.
  6. How can do different methods produce different p-values? Each p-value

    is de ned the same way: Which one is right? What is the typically unstated, tacit assumption? The p-value is the chance of observing the variation of the size observed right now. “ “
  7. The truth about p-values The missing statement that you need

    to say before any de ntion is: If our model is correct then The p-value is the chance of observing the variation of the size observed right now. “ “
  8. The corrected statement The missing statement that you need to

    say before any de ntion is: If our statistical model were correct then Since our model is not quite correct we still hope the difference is not that substantial so the p-value will still apply to some extent. Good luck mate. The p-value would be the chance of observing the variation of the size observed right now. “ “
  9. Scientists love to argue about methods My method is better

    than your method. You can perform your pairwise comparison in many ways: 1. Deseq1 2. Deseq2 3. edgeR The handbook has many detailed explanations on each and a script to do all three in parallel.
  10. I ran my comparison script now what? You end up

    with a list of transcripts, genes, features. Most publications are about interpreting these lists of genes. Go back to Lecture 5: How do I interpret a list of genes?
  11. Bioinformatics Recipes We are putting scripts on the web: https://psu.bioinformatics.recipes

    Tip: You can nish the last homework by looking at the results of the RNA-Seq Recipe. Help us make it better Turn your project a recipe.
  12. What is a recipe? A "megaton" scipt with a web

    interface. Runnable by someone else. Shareable between users. Modifyiable, customizable. It is still a script, but a web enabled one. Borrow each others scripts, learn and create full examples.
  13. Bioinformatics Recipes This is the rst public announcement! Developed while

    teaching this course. Now joining other tools that came out from this course: Galaxy, Biostar, Biostar Handbook and now the the new baby: Bioinformatics Recipes We have very high hopes and expectations - I think this time next year the bioinformatics world will be runnning on recipes.