Slide 1

Slide 1 text

dbFinder Leonardo Collado-Torres November 30, 2015

Slide 2

Slide 2 text

DER finder approach • Find contiguous base pairs with Differential Expression signal à DE Regions or DERs • Find nearest annotated feature

Slide 3

Slide 3 text

coverage vector 2 6 0 11 6 Genome (DNA) Read coverage Adapted from @jtleek

Slide 4

Slide 4 text

Jaffe et al, Nat. Neuroscience, 2015

Slide 5

Slide 5 text

Single-base F-statistics • Null model • Alternative Model • F-statistic i: base-pair j: sample Collado-Torres et al, bioRxiv, 2015

Slide 6

Slide 6 text

Single-base F-statistics Collado-Torres et al, bioRxiv, 2015 BrainSpan data

Slide 7

Slide 7 text

Compare DERs vs annotation Collado-Torres et al, bioRxiv, 2015 BrainSpan data

Slide 8

Slide 8 text

Common ChIP-seq analysis pipeline Peak Call Peak Call Peak Call Peak Call Peak Call Peak Call … Sample 1 Sample 2 Sample 3 Sample N Sample N-1 Sample N-2 2100 4230 7654 1236 5400 5954 # Unique Peaks Merge* All Unique Peaks (40000) ir Identify which merged peaks are differentially expressed using coverage (40000 tests)

Slide 9

Slide 9 text

Common ChIP-seq analysis pipeline Peak Call Peak Call Peak Call Peak Call Peak Call Peak Call … Sample 1 Sample 2 Sample 3 Sample N Sample N-1 Sample N-2 2100 4230 7654 1236 5400 5954 # Unique Peaks Merge* All Unique Peaks (40000) ir Identify which merged peaks are differentially expressed using coverage (40000 tests) Biological variability within a group is not incorporated into finding peaks Variability across peaks is not formally incorporated into merging step

Slide 10

Slide 10 text

Base-resolution differential binding ChIP-seq analysis pipeline … Sample 1 Sample 2 Sample 3 Sample N Sample N-1 Sample N-2 Identify differentially bound peaks using single base-level derfinder analysis Single List of Candidate Peaks Empirical p-values via permutations and FDRs Significant Peaks for Differential Binding “dbFinder”

Slide 11

Slide 11 text

Re-analysis of H3K4me3 data from developing and aging human brain • Downloaded H3K4me3 data: NeuN+ fraction of postnatal frontal cortex samples (Shuhla et al, PLoS Genetics 2013) • Modeled linear age-related changes in coverage across the genome • Identified 561 dbPeaks at FDR < 10% (using 100 permutations, 2.5 hours on JHPCE)

Slide 12

Slide 12 text

Re-analysis of H3K4me3 data from developing and aging human brain

Slide 13

Slide 13 text

Re-analysis of H3K4me3 data from developing and aging human brain Post-hoc analysis on mean coverage per sample per dbPeak

Slide 14

Slide 14 text

Re-analysis of H3K4me3 data from developing and aging human brain Overlap with published 1157 peaks (742 decrease across age, 415 increase): Down Up Not in dbPeaks 605 397 In dbPeaks 137 18 Down Up In Publish Peaks 278 21 Not Publish Peaks 262 Published peaks overlapping significant dbPeaks Significant dbPeaks overlapping published peaks

Slide 15

Slide 15 text

Re-analysis of H3K4me3 data from developing and aging human brain • Much shorter peaks in dbFinder analysis: median of 104bp (IQR: 87-132) versus 2047bp (1490-2959) in published peaks

Slide 16

Slide 16 text

Future directions • Add smoothing to test statistics prior to dbPeak finding • Analyze other datasets: – differentially binding by tissue/cell type from ENCODE across multiple groups

Slide 17

Slide 17 text

Acknowledgements Hopkins Jeffrey Leek LIBD Andrew Jaffe Indigo Rose Funding NIH LIBD CONACyT México