Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Benchmarking Deconvolution Methods in the Human...

Louise Huuki-Myers
April 24, 2024
140

Benchmarking Deconvolution Methods in the Human Brain

Internal Seminar at the Lieber Institute for Brain Development (4/23/2024)
Presenting work from our Deconvolution Benchmark Preprint: https://doi.org/10.1101/2024.02.09.579665

Louise Huuki-Myers

April 24, 2024
Tweet

Transcript

  1. Benchmarking Deconvolution Methods in the Human Brain Louise Huuki-Myers Staff

    Scientist 1 @lahuuki lahuuki.github.io Download these slides: speakerdeck.com/lahuuki 1
  2. Background: Cell Types in the Brain • The brain is

    made of complex tissues consisting of different types of cells • Some Dx associated with changes in cell type specific expression ◦ Ex. Pitt-Hopkins syndrome and oligodendrocytes (Phan et al, Nature Neuroscience, 2020) 3
  3. How can we connect bulk RNA-seq to cell type information?

    Tissue Bulk RNA-seq snRNA-seq Estimated proportions 4 Deconvolution $$$ $ Free!
  4. What is Deconvolution? Computational method that... • Infers the composition

    of different cell types in a bulk RNA-seq data • Utilizes single cell data to obtain cell type gene expression profiles 5
  5. Why is Deconvolution Important? • Tissue is heterogeneous ◦ Different

    cell types express genes at different levels • Samples can differ in cell type composition due to biology or dissection ◦ Check for differences in case vs. control • Controlling for cell fractions between samples can make case vs. control analysis cleaner ◦ Quality control ◦ Confounding factor in differential expression analysis - prevents false-positives and false-negatives 6
  6. How do you run deconvolution? 7 deconvolution(Y, Z) = Proportion

    of Cell Types Gene Expression Bulk RNA-seq Sample Gene Expression scRNA-seq cell type Populations Computational Algorithm Bulk Samples Proportion
  7. • There are 20+ single cell reference based methods published

    deconvolution(Y, Z) = Proportion of Cell Types Which Method Should We Use? ? ? ? ? 8
  8. Which Method is the Most Accurate? • Benchmarking shows that

    different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) • Benchmarking results from different papers on “real” data ◦ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ◦ Bisque paper: Bisque > MuSiC > CIBERSORT ▪ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ◦ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq ▪ Human PMBC flow sorted (Fig 7) ◦ Jin et al. benchmark: CIBERSORT, MuSiC > EPIC*, TIMER, DeconRNAseq ▪ Human Whole Blood, simulations ◦ Dai et al., benchmark: Dtangle > Bisque > Other Methods ▪ human brain IHC & scRNA-seq data 9
  9. The previous benchmarks didn’t agree on a deconvolution method... ...So

    we’re going to benchmark deconvolution methods 10
  10. Goals of Deconvolution Benchmark • Build multi-assay dataset with orthogonal

    cell type measurements • Test top deconvolution methods that employ different strategies • Assess impact of other factors in deconvolution ◦ Bulk RNA-seq data types ◦ snRNA-seq features ◦ Marker genes 11
  11. Selected Six Deconvolution Methods 12 Method Citation Approach Marker Gene

    Selection Availability Top Benchmark Performance DWLS (Dampened weighted least-squares) Tsoucas et al, Nature Comm, 2019 [5] weighted least squares - R package on CRAN Cobos et al. [18] Bisque Jew et al, Nature Comm, 2020 [6] Bias correction: Assay - R package on GitHub Dai et al. [17] MuSiC (Multi-subject Single-cell) Wang et al, Nature Communications, 2019 [7] Bias correction: Source Weights Genes R package GitHub Jin et al. [20] BayesPrism Chu et al., Nature Cancer, 2022 [8] Bayesian Pairwise t-test Webtool R package on GitHub Hippen et al. [22] hspe (dtangle) (hybrid-scale proportion estimation) Hunt and Gagnon-Bartsch, Ann. Appl. Stat. 2021 [9, 45] High collinearity adjustment Multiple options- default “ratio” 1vALL mean expression ratio R package on GitHub Dai et al. [17] CIBERSORTx Newman et al., Nat Biotech, 2019 [11] Machine Learning Differential Gene expression Webtool, Docker Image Jin et al. [20]
  12. How can we build on previous benchmarks? Previous Strategies to

    Assess Accuracy • Use pseudobulk samples ◦ Known or simulated composition ◦ May not reflect real bulk RNA-seq data • Compare with Immunofluorescence Data • Cell flow sorting ◦ Difficult to label nuclei by cell type 13 Our Strategy • Use paired orthogonal imaging data to measure cell type proportions & evaluate method accuracy • Focus on brain tissue
  13. Orthogonal Data • Alternative measurement of the same thing (cell

    type proportions) ◦ Multiple independent measurements build confidence • “Gold standard” ◦ *All methods have biases 14
  14. Huuki-Myers et al, bioRxiv, 2023 10.1101/2023.02.15.528722 • 10 Donors (n=19)

    • Seven broad cell types • 56k nuclei 18 Single Nucleus RNA-seq References
  15. Bulk RNA-seq ← Library Type → ← RNA Extraction →

    n = 110 6 library type + RNA Extraction combinations 19
  16. RNAScope/IF Experiment Design • Measure the abundance of 6 broad

    cell types • Filtered for high quality images Kelsey Montgomery 20
  17. RNAScope vs. snRNA-seq Proportions 23 Comparing Cell Type Proportions •

    Pearson’s correlation (cor) • Root Mean Squared Error (rmse) • Relative rmse (rrmse)
  18. deconvolution(Y, Z) = Proportion of Cell Types Six Methods 1.

    DWLS 2. Bisque 3. MuSiC 4. BayesPrism 5. hspe 6. CIBERSORTx vs. 24 Experimental Design Connection to Benchmark
  19. Evaluate Deconvolution Methods 27 Method 1. What is the most

    accurate deconvolution method for brain tissue? 2. Is accuracy impacted by type of bulk RNA-seq? a. Library type? b. RNA extraction?
  20. Run Deconvolution 28 deconvolution(Y, Z) = Proportion of Cell Types

    110 bulk samples Paired snRNA-seq 7 cell types
  21. Methods return a wide range of proportion estimates 29 B2720_post

    Each Tissue Block has 6 Bulk RNA-seq samples
  22. Bisque and hspe are Most Accurate Methods Compared to RNAScope/IF

    Accurate Methods have: • High Pearson’s correlation (cor) • Low Root Mean Squared Error (rmse) 31
  23. Method Evaluate Six Deconvolution Methods 34 1. What is the

    most accurate deconvolution method for brain tissue? hspe & Bisque 2. Is accuracy impacted by type of bulk RNA-seq? Yes a. Library type? Bisque more accurate in polyA, hspe in RiboZeroGold b. RNA extraction? Some impact but inconsistent
  24. Marker Genes Select Effective Marker Genes 36 1. Does selecting

    marker genes improve deconvolution? 2. How to best select good sets of marker genes?
  25. Marker Gene Selection • Filter for genes expressed in snRNA-seq

    and bulk data • Looking for genes expressed in only one cell type ◦ Test for specificity of each gene for each cell type • Observe expression of selected marker genes ◦ Heat maps of pseudobulked data The Ideal Heatmap snRNAseq data, Pseudobulked by cell type 37 Stephanie C Hicks Marker Genes
  26. Marker Gene Sets Tested 1. Full (17,804 genes) a. set

    of genes common between the bulk and snRNA-seq datasets 2. 1vALL top25 (145 genes) a. top 25 genes ranked by fold change for each cell type, then filtered to common genes 3. MeanRatio top25 (151 genes) a. top 25 genes ranked by MeanRatio for each cell type, then filtered to common genes 4. MeanRatio over2 (557 genes) a. All genes for each cell type with MeanRatio > 2 5. MeanRatio MAD3 (520 genes) a. All genes for each cell type with MeanRatio > 3 median absolute deviations (MADs) greater than the median of all MeanRatios > 1 41
  27. Marker Genes Select Effective Marker Genes 44 1. Does selecting

    marker genes improve deconvolution? Depends on the method ◦ hspe more sensitive than Bisque 2. How to best select good sets of marker genes? Mean Ratio top25 ◦ Mean Ratio top25 balanced rmse and correlation in Bisque & hspe
  28. Other Factors Can Impact Method Performance 46 Dataset Features 1.

    What Features of snRNA-seq reference dataset can impact deconvolution accuracy? a. Number of donors? b. Donor diversity? c. Existing proportion of cell types?
  29. 47 Tran, Maynard et al., Neuron, 2021 Mathys et al.,

    Nature, 2019 Paired snRNA-seq Features of Other DLPFC snRNA-seq Datasets
  30. Other Factors Can Impact Method Performance 51 Dataset Features 1.

    What features of snRNA-seq reference dataset can impact deconvolution accuracy? a. Number of donors? Bisque performs poorly with <4 donors b. Donor diversity? Bisque and hspe were unaffected by inclusion of AD cases c. Existing proportion of cell types? Bisque is biased to snRNA-seq proportions
  31. Marker Genes Method Benchmark Conclusions 53 Dataset Features hspe &

    Bisque are top performing methods • hspe better for RiboZeroGold Mean Ratio effectively selects cell type specific genes • MR Top 25 improves performance of top methods Many factors impact deconvolution accuracy • Bisque is sensitive to low donors and input cell proportions
  32. How do our conclusions compare to other benchmarks? 54 Benchmark

    Strategy Tissue Top Methods Cobos et al. Pseudobulk, Flow sorting Blood, pancreas, kidney DWLS Jin et al. Flow sorting Blood CIBERSORT, MuSiC Dai et al. Immunohistochemistry, scRNA-seq pseudobulk Brain 🧠 dtangle (hspe), Bisque
  33. How do our conclusions compare to other benchmarks? 55 Benchmark

    Strategy Tissue Top Methods Cobos et al. Pseudobulk, Flow sorting Blood, pancreas, kidney DWLS Jin et al. Flow sorting Blood CIBERSORT, MuSiC Dai et al. Immunohistochemistry, scRNA-seq pseudobulk Brain 🧠 dtangle (hspe), Bisque LIBD RNAScope/IF Brain 🧠 hspe, Bisque new! ✅
  34. Resources • DeconvoBuddies R package in Development ◦ R/Bioconductor package

    with tools for marker finding & plotting ◦ github.com/LieberInstitute/DeconvoBuddies ◦ Access paired dataset ▪ Bulk RNA-seq ▪ snRNA-seq data ▪ RNAScope Proportions • Deconvolution code tutorial + video ◦ updated version at LIBD Rstats club on May 3rd 60
  35. Acknowledgements Leonardo Collado-Torres Kristen Maynard Stephanie C Hicks 61 Kelsey

    Montgomery Sang Ho Kwon Sean Maden Nick Eagles Thank you! Any Questions? Sophia Cinquemani Download these slides: speakerdeck.com/lahuuki @lahuuki Daianna Gonzalez-Padilla NIMH Grant: R01 MH123183 & R01 MH111721
  36. Comparing Estimates • Bisque vs. hspe predict similar proportions ◦

    Cor = 0.938 • Bisque has highest cor with snRNA-seq ◦ Cor = 0.743 62
  37. Method Predictions over 13 Brain Regions GTEx v8 Brain dataset

    Expected patterns • Cerebellum contains more Inhib • Caudate having an increased proportion of inhibitory neurons compared to frontal cortex 64
  38. Dai et al., BioRxiv Benchmark • Top deconvolution methods: dtangel(hspe)

    and Bisque • Cell Type specific expression methods: bMIND 67 Figure 2 Figure 3