Upgrade to Pro — share decks privately, control downloads, hide ads and more …

jc2013

 jc2013

Leonardo Collado-Torres

October 03, 2013
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. Fast differential expression analysis annotation-agnostic across groups with biological replicates

    Leonardo Collado-Torres tweet: @fellgernon blog: tinyurl.com/FellBit
  2. @fellgernon #biostatJC2013 Field overview Ultimate Goal What is the biological

    (genomic) cause, if any, of X disease? Currently What are the most likely genomic difference(s) between two+ groups?
  3. @fellgernon #biostatJC2013 Tools • Molecular biology: reverse transcriptase • High-throughput

    sequencing • $$ and – > Large number of biological replicates • Computers • Biostatistics Image: http://bit.ly/15MVhSU
  4. @fellgernon #biostatJC2013 Split by chromosome and filter n samples à

    ~760 million nt Rows with at least 1 sample with coverage > 5
  5. @fellgernon #biostatJC2013 How can we make it fast? • Avoid

    Input/Output as much as possible • Work by chromosome • Reduce memory – Run Length Encoding (IRanges::Rle) 0000111111222 = (0, 1, 2) (4, 6, 3) • Use multiple cores (parallel::mclapply) – Split data to use cores efficiently • Calculate F-stats using Rcpp (Has + and -)
  6. @fellgernon #biostatJC2013 Finding candidate DERs: example dataRegions 450 500 550

    600 segs 450 500 550 600 pieces 450 500 550 600 ders 450 500 550 600 450 500 550 600 0.0 1.0 2.0 Index f
  7. @fellgernon #biostatJC2013 Example: re-cap dataRegions 450 500 550 600 segs

    450 500 550 600 pieces 450 500 550 600 ders 450 500 550 600 450 500 550 600 0.0 1.0 2.0 Index f
  8. @fellgernon #biostatJC2013 Example: result Cluster for region with name COL6A1

    and q-value 0.8256 chr21 chr21 Coverage 1 2 group CEU YRI Mean coverage 0.125 0.500 group CEU YRI Regions significantQval TRUE FALSE tx_name (gene_id) tx_name(gene_id) 47411000 47411200 47411400 47411600
  9. @fellgernon #biostatJC2013 Public datasets • derfinderExample: – Blood CEU vs

    YRI non-related individuals • derHippo: – Brain hippocampus from cocaine addicts, alcohol addicts, and controls • derSnyder: – Michael Snyder time course (~1 year): 2 x diseases, recovery & healthy periods • derStem: – 5 stem cell types, 2 replicates per group
  10. @fellgernon #biostatJC2013 Coverage adjustment? • • • • • •

    • • • • • • • • • • • • • • 6.0e+07 8.0e+07 1.0e+08 1.2e+08 1.4e+08 1.6e+08 1.8e+08 6.0e+08 8.0e+08 1.0e+09 1.2e+09 1.4e+09 1.6e+09 chr 1 total Cov vs Cov < quantile 0.9 Coverage at bases < quantile 0.9 Total coverage 0to186 186to294 294to322 322to400 • • • • • • • • • • • • • • • • • • • • 1.5e+07 2.0e+07 2.5e+07 3.0e+07 3.5e+07 4.0e+07 6.0e+08 8.0e+08 1.0e+09 1.2e+09 1.4e+09 1.6e+09 chr 1 total Cov vs # bases with data Number bases with Cov > 0 Total coverage 0to186 186to294 294to322 322to400 derSnyder Similar to metagenomeSeq::cumNorm
  11. @fellgernon #biostatJC2013 Acknowledgements Leek Group Jeffrey Leek Alyssa Frazee Hopkins

    Sarven Sabunciyan Ben Langmead Lieber Institute (LIBD) Andrew Jaffe Harvard Rafa Irizarry Funding NIH (Aug 2012- July 2013) LIBD (Aug 2013 - now) CONACyT México