Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The q-value and comparing DE across datasets

The q-value and comparing DE across datasets

This presentation discusses the meaning of the q-value in statistics and covers appropriate ways to compare differential-expression results in genomics across datasets.

Nicholas Eagles

March 05, 2025

More Decks by Nicholas Eagles

Other Decks in Research


  1. The q-value and Comparing DE Results Across Datasets Presented by

    Nick Eagles 2025/02/05 Hoffman et al, DOI: 10.1016/j.biopsych.2021.03.020 Storey et al, DOI: 10.1073/pnas.1530509100
  2. p-value, q-value, and FDR - p-value describes the false positive

    rate - If p = 0.01, 1% of truly null tests are called significant - FDR: false discovery rate - Proportion of significant results that are truly null - q-value: proportion of all significant tests expected to be null when a particular feature is called significant - Setting a threshold of 0.05 for all q-values corresponds to 5% FDR - Provides an interpretable control over false positives when testing many features (e.g. genes in DE)
  3. π 0 Statistic - An estimate of the proportion of

    tests that are truly null - “What fraction of genes are not differentially expressed?” - Note: cannot know which features are null/ alternative
  4. qvalue Bioconductor package summary(qobj) ## Call: ## qvalue(p = hedenfalk$p)

    ## ## pi0: 0.669926 ## ## Cumulative number of significant calls: ## ## <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 ## p-value 15 76 265 424 605 868 ## q-value 0 0 1 73 162 319 ## local FDR 0 0 3 29 85 167 ## <1 ## p-value 3170 ## q-value 3170 ## local FDR 2239 qobj <- qvalue(p = hedenfalk$p) Calculating q-values Examining q-value object
  5. Background - Exploring sex differences among schizophrenia cases - 281

    females and 497 males from two cohorts: MSSM-Penn-Pitt and NIMH-HBCC - Performed DE within separated cohorts, within the merged cohorts, and made comparisons across cohorts - Did not use p-value or FDR cutoffs when comparing overlap of DEGs across cohorts - DEGs under hard thresholds are sensitive to power, a dataset-specific property!
  6. Concordance of t-statistics - Comparison of case-control t-statistics across cohorts

    - Reported 2 associated metrics: - Spearman correlation: 0.343 - p-value: 10-300
  7. DE Results in Combined Data - Merging cohorts and performing

    case-control DE on the full data - Captures shared schizophrenia-related signal