Differential expression analysis tools

567d15666cd2891a4e6c49e007f30a08?s=47 Alyssa Frazee
September 07, 2014

Differential expression analysis tools

Talk given at ECCB 2014 workshop: http://www.eccb14.org/program/workshops/rna-seq, and at University of Zurich, on tools we've built to do differential expression analysis without relying on existing gene/exon/isoform annotation.

567d15666cd2891a4e6c49e007f30a08?s=128

Alyssa Frazee

September 07, 2014
Tweet

Transcript

  1. 5.

    24376000 24378000 24380000 24382000 24384000 genomic position 332 333 334

    335 336 337 338 but read-counting is challenging  
  2. 7.
  3. 10.

    DER Finder PMID 24398039 idea: scan genome base-by- base, highlight

    segments showing differential expression signal
  4. 11.

    DER Finder 0 2 4 6 8 log2(count+1) chr22: 17684448−17684670

    normal tumor −1 0 1 2 3 4 5 t statistic states genomic position DE signal read coverage
  5. 12.

    DER Finder 0 2 4 6 8 log2(count+1) chr22: 17684448−17684670

    normal tumor −1 0 1 2 3 4 5 t statistic states genomic position DE signal read coverage
  6. 13.

    find signal at each nucleotide samples indexed by i locations

    indexed by l j confounders indexed by k expression confounders covariate of interest
  7. 14.

    samples indexed by i locations indexed by l j confounders

    indexed by k expression confounders covariate of interest  v   find signal at each nucleotide
  8. 15.

    DE DE not DE segment genome into groups of nucleotides

    with similar signal t 1 t 2 t 3 t 4 t 5 DE not DE
  9. 16.

    DE DE not DE segment genome into groups of nucleotides

    with similar signal t 1 t 2 t 3 t 4 t 5 DE not DE Hidden Markov Model
  10. 17.

    0 2 4 6 8 log2(count+1) chr22: 17684448−17684670 normal tumor

    −1 0 1 2 3 4 5 t statistic xaxinds exons states 17684041 17684451 17684551 17684651 17684754 genomic position linear models HMM (candidate DERs)
  11. 18.

    0 2 4 6 8 log2(count+1) chr22: 17684448−17684670 normal tumor

    −1 0 1 2 3 4 5 t statistic xaxinds exons states 17684041 17684451 17684551 17684651 17684754 genomic position linear models HMM permutation tests for statistical significance
  12. 19.

    0 2 4 6 8 log2(count+1) chr22: 17684448−17684670 normal tumor

    −1 0 1 2 3 4 5 t statistic xaxinds exons states 17684041 17684451 17684551 17684651 17684754 genomic position match to annotation if desired: CECR1, “may play a role in regulating cell proliferation”
  13. 20.

    engineering challenges •  creating and handling nucleotide-by- sample matrix • 

    efficient linear model fitting (solution: lmFit) •  efficient segmentation with HMM •  efficient p-value calculations Initial solution: https://github.com/alyssafrazee/derfinder
  14. 22.

    Ballgown biorXiv preprint: http://biorxiv.org/content/early/2014/03/30/003665, Bioconductor package “ballgown” Align Reads! (e.g.

    TopHat)! Assemble Transcripts! (e.g. Cufflinks)! Estimate Expression! (Cufflinks via Tablemaker, RSEM)! Differential Expression Tests! (default Ballgown models, limma, EdgeR, DESeq,…)! paired-end RNA-seq reads! Transcriptome Assembly! Pipelines! R/Bioconductor Pipelines! Ballgown as connecting framework! transcriptome assembly pipelines R/Bioconductor DE analysis
  15. 23.

    S4 class for transcript assemblies ballgown object data structure indexes

    exon intron transcript exon intron transcript e2t i2t t2g pData bamfiles expr matrices GRanges data frames
  16. 24.

    easy exploration and DE analysis 24376000 24378000 24380000 24382000 24384000

    genomic position 332 333 334 335 336 337 338 plotting functions
  17. 25.

    stat_results = stattest(my_assembly, feature='transcript', meas='FPKM', covariate='group’) head(stat_results) ## feature id

    pval qval ## transcript 10 0.01381576 0.105212332 ## transcript 25 0.26773622 0.791149753 ## transcript 35 0.01085070 0.089518254 ## transcript 41 0.47108019 0.902537475 ## transcript 45 0.08402948 0.489348136 ## transcript 67 0.27317385 0.79114975 easy exploration and DE analysis statistical tests (drop-in replacement for Cuffdiff)
  18. 26.

    ballgown object data structure indexes exon intron transcript exon intron

    transcript e2t i2t t2g pData bamfiles expr easy exploration and DE analysis statistical tests easily connects to existing DE packages
  19. 27.

    easy exploration and DE analysis annotation functions 24615000 24620000 24625000

    24630000 24635000 24640000 genomic position Assembled and Annotated Transcripts annotated assembled get corresponding gene names, match assembled and annotated transcripts, plot assembly alongside annotation
  20. 30.

    Thanks Collaborators: Jeff Leek (advisor), Steven Salzberg, Ben Langmead, Andrew

    Jaffe, Rafa Irizarry, Sarven Sabunciyan, Kasper Hansen, Geo Pertea, Leonardo Collado Torres Contact: alyssafrazee.com, alyssa.frazee@jhu.edu, @acfrazee (Twitter)
  21. 31.

    differential expression model for each transcript, compare the fits of

    the following models using an F-test. Null hypothesis is that the fits of model (a) and model (b) are equally good; alternative is that (a) fits better. (a) (b) BRIEF ARTICLE THE AUTHOR expressioni = ↵ + 0groupi + P X p=1 pconfounderip + noiseip expressioni = ↵⇤ + P X p=1 ⇤ pconfounderip + noise ⇤ ip expressioni = ↵ + K X k=1 ksplinek(timei) + P X p=1 pconfounderip + noiseip P