Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WCS-LA-2024

 WCS-LA-2024

Single-cell genomics webinar LA speaker at WCS

Leonardo Collado-Torres

September 25, 2024
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. @lcolladotor lcolladotor.github.io lcolladotor.github.io/bioc_team_ds Benchmarking cell type deconvolution methods with human

    brain data Leonardo Collado Torres, LIBD Investigator + Asst. Prof. Johns Hopkins Biostatistics Single-cell genomics webinar LA speaker at WCS Sept 26 2024 Slides available at speakerdeck.com/lcolladotor
  2. • Bioinformatics • R and Bioconductor • Reproducibility and best

    practices • Outreach and community building • Back in 2005 at @LCGUNAM: I like math and coding; biology provides the challenging problems What defines me
  3. History 2005-2009 Undergrad in Genomic Sciences 2009-2011 2011-2016 August 2016+

    Data Science Division Leader 🇽 🇽 PIs: • Jeff Leek: 2012+ • Andrew Jaffe: 2013+ Ph.D. Biostatistics Staff Scientist I → II → Research Scientist → Investigator Data Science Team I PIs: • Andrew Jaffe 2016-2020 • Myself 2020+ Division Leader: Keri Martinowich 2024+
  4. 2008+ • BioC 2008-2011, 2014, 2017, 2019-2023 • useR!2013, 2021

    • rOpenSci unconf 2018 • RStudio::conf 2019-2021 @lcolladotor 2010+ @LIBDrstats 2018+ @CDSBMexico 2018+ Defunct: BmoreBiostats, Biostats Cultural Mixers Guest @RLadiesBmore #RLadiesMx Blog: http://lcolladotor.github.io 2011+ FB: 75k, Tw: 66k weekly Interests
  5. Background: Cell Types in the Brain • The brain is

    made of complex tissues consisting of different types of cells • Some diagnoses associated with changes in cell type specific expression ◦ Ex. Pitt-Hopkins syndrome and oligodendrocytes (Phan et al, Nature Neuroscience, 2020) 6 Louise Huuki-Myers @lahuuki speakerdeck.com/lahuuki/benchmarki ng-deconvolution-methods-in-the-hum an-brain
  6. How can we connect bulk RNA-seq to cell type information?

    Tissue Bulk RNA-seq snRNA-seq Estimated proportions 7 Deconvolution $$$ $ Free!
  7. What is Deconvolution? Computational method that... • Infers the composition

    of different cell types in a bulk RNA-seq data • Utilizes single cell data to obtain cell type gene expression profiles 8
  8. Why is Deconvolution Important? • Tissue is heterogeneous ◦ Different

    cell types express genes at different levels • Samples can differ in cell type composition due to biology or dissection ◦ Check for differences in case vs. control • Controlling for cell fractions between samples can make case vs. control analysis cleaner ◦ Quality control ◦ Confounding factor in differential expression analysis - prevents false-positives and false-negatives 9
  9. How do you run deconvolution? 10 deconvolution(Y, Z) = Proportion

    of Cell Types Gene Expression Bulk RNA-seq Sample Gene Expression scRNA-seq cell type Populations Computational Algorithm Bulk Samples Proportion
  10. • There are 20+ single cell reference based methods published

    deconvolution(Y, Z) = Proportion of Cell Types Which Method Should We Use? ? ? ? ? 11
  11. Which Method is the Most Accurate? • Benchmarking shows that

    different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) • Benchmarking results from different papers on “real” data ◦ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ◦ Bisque paper: Bisque > MuSiC > CIBERSORT ▪ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ◦ Cobos et al. benchmark: DWLS > MuSiC > Bisque > deconvoSeq ▪ Human PMBC flow sorted (Fig 7) ◦ Jin et al. benchmark: CIBERSORT, MuSiC > EPIC*, TIMER, DeconRNAseq ▪ Human Whole Blood, simulations ◦ Dai et al., benchmark: Dtangle > Bisque > Other Methods ▪ human brain IHC & scRNA-seq data 12
  12. Goals of Deconvolution Benchmark • Build multi-assay dataset with orthogonal

    cell type measurements • Test top deconvolution methods that employ different strategies • Assess impact of other factors in deconvolution ◦ Bulk RNA-seq data types ◦ snRNA-seq features ◦ Marker genes 13
  13. How can we build on previous benchmarks? Previous Strategies to

    Assess Accuracy • Use pseudobulk samples ◦ Known or simulated composition ◦ May not reflect real bulk RNA-seq data • Compare with Immunofluorescence Data • Cell flow sorting ◦ Difficult to label nuclei by cell type 14 Our Strategy • Use paired orthogonal imaging data to measure cell type proportions & evaluate method accuracy • Focus on brain tissue
  14. Orthogonal Data • Alternative measurement of the same thing (cell

    type proportions) ◦ Multiple independent measurements build confidence • “Gold standard” ◦ *All methods have biases 15
  15. Huuki-Myers et al, Science, 2024 10.1126/science.adh1938 • 10 Donors (n=19)

    • Seven broad cell types • 56k nuclei 19 Single Nucleus RNA-seq References
  16. Bulk RNA-seq ← Library Type → ← RNA Extraction →

    n = 110 6 library type + RNA Extraction combinations 20
  17. RNAScope/IF Experiment Design • Measure the abundance of 6 broad

    cell types • Filtered for high quality images Kelsey Montgomery 21
  18. RNAScope vs. snRNA-seq Proportions 24 Comparing Cell Type Proportions •

    Pearson’s correlation (cor) • Root Mean Squared Error (rmse) • Relative rmse (rrmse)
  19. deconvolution(Y, Z) = Proportion of Cell Types Six Methods 1.

    DWLS 2. Bisque 3. MuSiC 4. BayesPrism 5. hspe 6. CIBERSORTx vs. 25 Experimental Design Connection to Benchmark
  20. Evaluate Deconvolution Methods 28 Method 1. What is the most

    accurate deconvolution method for brain tissue? 2. Is accuracy impacted by type of bulk RNA-seq? a. Library type? b. RNA extraction?
  21. Run Deconvolution 29 deconvolution(Y, Z) = Proportion of Cell Types

    110 bulk samples Paired snRNA-seq 7 cell types
  22. Methods return a wide range of proportion estimates 30 B2720_post

    Each Tissue Block has 6 Bulk RNA-seq samples
  23. Bisque and hspe are Most Accurate Methods Compared to RNAScope/IF

    Accurate Methods have: • High Pearson’s correlation (cor) • Low Root Mean Squared Error (rmse) 32
  24. Method Evaluate Six Deconvolution Methods 35 1. What is the

    most accurate deconvolution method for brain tissue? hspe & Bisque 2. Is accuracy impacted by type of bulk RNA-seq? Yes a. Library type? Bisque more accurate in polyA, hspe in RiboZeroGold b. RNA extraction? Some impact but inconsistent
  25. Marker Genes Select Effective Marker Genes 37 1. Does selecting

    marker genes improve deconvolution? 2. How to best select good sets of marker genes?
  26. Marker Gene Selection • Filter for genes expressed in snRNA-seq

    and bulk data • Looking for genes expressed in only one cell type ◦ Test for specificity of each gene for each cell type • Observe expression of selected marker genes ◦ Heat maps of pseudobulked data The Ideal Heatmap snRNAseq data, Pseudobulked by cell type 38 Stephanie C Hicks Marker Genes
  27. Marker Gene Sets Tested 1. Full (17,804 genes) a. set

    of genes common between the bulk and snRNA-seq datasets 2. 1vALL top25 (145 genes) a. top 25 genes ranked by fold change for each cell type, then filtered to common genes 3. MeanRatio top25 (151 genes) a. top 25 genes ranked by MeanRatio for each cell type, then filtered to common genes 4. MeanRatio over2 (557 genes) a. All genes for each cell type with MeanRatio > 2 5. MeanRatio MAD3 (520 genes) a. All genes for each cell type with MeanRatio > 3 median absolute deviations (MADs) greater than the median of all MeanRatios > 1 42
  28. Marker Genes Select Effective Marker Genes 45 1. Does selecting

    marker genes improve deconvolution? Depends on the method ◦ hspe more sensitive than Bisque 2. How to best select good sets of marker genes? Mean Ratio top25 ◦ Mean Ratio top25 balanced rmse and correlation in Bisque & hspe
  29. Other Factors Can Impact Method Performance 47 Dataset Features 1.

    What Features of snRNA-seq reference dataset can impact deconvolution accuracy? a. Number of donors? b. Donor diversity? c. Existing proportion of cell types?
  30. 48 Tran, Maynard et al., Neuron, 2021 Mathys et al.,

    Nature, 2019 Paired snRNA-seq Features of Other DLPFC snRNA-seq Datasets
  31. Other Factors Can Impact Method Performance 52 Dataset Features 1.

    What features of snRNA-seq reference dataset can impact deconvolution accuracy? a. Number of donors? Bisque performs poorly with <4 donors b. Donor diversity? Bisque and hspe were unaffected by inclusion of AD cases c. Existing proportion of cell types? Bisque is biased to snRNA-seq proportions
  32. Marker Genes Method Benchmark Conclusions 54 Dataset Features hspe &

    Bisque are top performing methods • hspe better for RiboZeroGold Mean Ratio effectively selects cell type specific genes • MR Top 25 improves performance of top methods Many factors impact deconvolution accuracy • Bisque is sensitive to low donors and input cell proportions
  33. How do our conclusions compare to other benchmarks? 55 Benchmark

    Strategy Tissue Top Methods Cobos et al. Pseudobulk, Flow sorting Blood, pancreas, kidney DWLS Jin et al. Flow sorting Blood CIBERSORT, MuSiC Dai et al. Immunohistochemistry, scRNA-seq pseudobulk Brain 🧠 dtangle (hspe), Bisque
  34. How do our conclusions compare to other benchmarks? 56 Benchmark

    Strategy Tissue Top Methods Cobos et al. Pseudobulk, Flow sorting Blood, pancreas, kidney DWLS Jin et al. Flow sorting Blood CIBERSORT, MuSiC Dai et al. Immunohistochemistry, scRNA-seq pseudobulk Brain 🧠 dtangle (hspe), Bisque LIBD RNAScope/IF Brain 🧠 hspe, Bisque new! ✅
  35. Resources • DeconvoBuddies R package ◦ R/Bioconductor package with tools

    for marker finding & plotting ◦ https://research.libd.org/DeconvoBuddies/ ◦ Access paired dataset ▪ Bulk RNA-seq ▪ snRNA-seq data ▪ RNAScope Proportions • Deconvolution code tutorial + video ◦ updated version at LIBD Rstats club on May 3rd 57
  36. Acknowledgements Kristen Maynard Stephanie C Hicks 59 Kelsey Montgomery Sang

    Ho Kwon Sean Maden Nick Eagles Thank you! Any Questions? Sophia Cinquemani Download these slides: speakerdeck.com/lahuuki @lahuuki Daianna Gonzalez-Padilla NIMH Grant: R01 MH123183 & R01 MH111721 Louise Huuki-Myers
  37. Selected Six Deconvolution Methods 60 Method Citation Approach Marker Gene

    Selection Availability Top Benchmark Performance DWLS (Dampened weighted least-squares) Tsoucas et al, Nature Comm, 2019 [5] weighted least squares - R package on CRAN Cobos et al. [18] Bisque Jew et al, Nature Comm, 2020 [6] Bias correction: Assay - R package on GitHub Dai et al. [17] MuSiC (Multi-subject Single-cell) Wang et al, Nature Communications, 2019 [7] Bias correction: Source Weights Genes R package GitHub Jin et al. [20] BayesPrism Chu et al., Nature Cancer, 2022 [8] Bayesian Pairwise t-test Webtool R package on GitHub Hippen et al. [22] hspe (dtangle) (hybrid-scale proportion estimation) Hunt and Gagnon-Bartsch, Ann. Appl. Stat. 2021 [9, 45] High collinearity adjustment Multiple options- default “ratio” 1vALL mean expression ratio R package on GitHub Dai et al. [17] CIBERSORTx Newman et al., Nat Biotech, 2019 [11] Machine Learning Differential Gene expression Webtool, Docker Image Jin et al. [20]
  38. Comparing Estimates • Bisque vs. hspe predict similar proportions ◦

    Cor = 0.938 • Bisque has highest cor with snRNA-seq ◦ Cor = 0.743 61
  39. Method Predictions over 13 Brain Regions GTEx v8 Brain dataset

    Expected patterns • Cerebellum contains more Inhib • Caudate having an increased proportion of inhibitory neurons compared to frontal cortex 63
  40. Dai et al. benchmark • Top deconvolution methods: dtangle (hspe)

    and Bisque • Cell Type specific expression methods: bMIND 66 Figure 2 Figure 3