Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Deconvolution - Seminar at UCL ICH

Introduction to Deconvolution - Seminar at UCL ICH

An introduction to our work on deconvolution in at LIBD. Presented as a seminar to the Developmental Biology & Cancer group at the University College London Great Ormond Street Institute of Child Health on May 22, 2023.

Louise Huuki-Myers

May 22, 2023

More Decks by Louise Huuki-Myers

Other Decks in Science


  1. Introduction to Cell Type Deconvolution Louise Huuki-Myers Staff Scientist 1

    @lahuuki lahuuki.github.io Download these slides: speakerdeck.com/lahuuki 1
  2. About Lieber Institute for Brain Development • Non-profit Research Institute

    in Baltimore, MD • Study the genetics of neuropsychiatric disorders 🧬 • 139 multidisciplinary scientists • Affiliated with the Johns Hopkins Medical School 2 Baltimore Maryland 🔸
  3. Our R/Bioconductor Powered Data Science Team • Led by Leonardo

    Collado-Torres • Computational lab specializing in: ◦ RNA seq analysis ▪ Bulk, single cell, spatial ◦ Open Source software development ◦ Knowledge sharing ▪ Data Science Guidance Sessions ▪ Rstat Club: Videos available www.youtube.com/@lcolladotor • Team website ◦ lcolladotor.github.io/ 3
  4. About Me • Staff Scientist at LIBD ◦ Joined in

    2020 ◦ Working on Bulk RNA-seq, single cell RNA-seq, spatial transcriptomics • Masters in Bioinformatics from Temple University Philadelphia, PA ◦ Previously worked on evolutionary time trees • Other interest: ◦ running, rowing, baking 4 @lahuuki
  5. Background: Cell Types in the Brain • The brain is

    made of complex tissues consisting of different types of cells • Some Dx associated with changes in cell type specific expression ◦ Ex. Pitt-Hopkins syndrome and oligodendrocytes (Phan et al, Nature Neuroscience, 2020) 6
  6. What is Deconvolution? • Inferring the composition of different cell

    types in a bulk RNA-seq data • Utilize single cell data to obtain cell type gene expression profiles 8
  7. Why is Deconvolution Important? • Tissue is heterogeneous ◦ Different

    cell types express genes at different levels • Samples can differ in cell type composition due to biology or dissection ◦ Check for differences in case vs. control • Controlling for cell fractions between samples can make case vs. control analysis cleaner ◦ Quality control ◦ Confounding factor in differential expression analysis - prevents false-positives and false-negatives 9
  8. How do you run deconvolution? 10 deconvolution(Y, Z) = Proportion

    of Cell Types Gene Expression Bulk RNA-seq Sample Gene Expression scRNA-seq cell type Populations Computational Algorithm Bulk Samples Proportion
  9. Method Summary Method Regression Correction for Technical Variation Other Features

    MuSiC Wang et al, Nature Communications, 2019 W-NNLS regression (Weighted - Non-negative least squares) None Tree guided deconvolution, good for closely related cell types Bisque Jew et al, Nature Communications, 2020 NNLS regresion Gene specific transformation of bulk data Leverage overlapping bulk & sc data SCDC Dong et al, Briefings in Bioinformatics, 2020 W-NNLS framework proposed by MuSiC Option for Gene specific transformation of bulk data (from Bisque) Multiple reference datasets can be used, results combined with ENSEMBL weights DWLS Tsoucas, Nature Communications, 2019 Dampened Weighted least squares None 12
  10. Which Method is the Most Accurate? • Benchmarking shows that

    different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) • Benchmarking results from different papers on “real” data ◦ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ◦ Bisque paper: Bisque > MuSiC > CIBERSORT ▪ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ◦ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b) ◦ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq ▪ Human PMBC flow sorted (Fig 7) 13
  11. Why we like Bisque • Benchmarked with a DLPFC dataset

    • Robust to marker set • Robust to library prep • More reasonable estimates on GTEx dataset Stay Tuned: Methods benchmark in the works! 14
  12. Important Factors • Number and diversity of donors (4+) •

    Resolution of cell types • Does it match the bulk data? ◦ Same tissue or region? Same experimental conditions • Same cellular fraction? ◦ Brain tissue is often limited to single nucleus 16
  13. Single Nucleus RNA-seq References Tran, Maynard et al, Neuron, 2021

    10.1016/j.neuron.2021.09.001 • 5 Brain Regions + 8 Donors ◦ Amygdala, sACC, Hippocampus, NAc, DLPFC • Utilize “pan brain” annotation to maximize donors Matthew N Tran 17 Kelsey Montgomery
  14. Huuki-Myers et al, bioRxiv, 2023 10.1101/2023.02.15.528722 • DLPFC + 10

    Donors (n=19) • Layer level cell type annotation • Access with SpatialLIBD 18 Single Nucleus RNA-seq References
  15. What are Marker Genes? • “Define” cell types ◦ Differentially

    expressed between cell types • Historically ◦ Know markers associated with key cell types ◦ Ex. MBP: major constituent of the myelin sheath, marker for oligodendrocytes • What does the Data tell us? ◦ Human vs. model organisms ◦ Regional ◦ Technical differences 20
  16. Marker Gene Selection • Filter for genes expressed in snRNA-seq

    and bulk data • Looking for genes expressed in only one cell type ◦ Test for specificity of each gene for each cell type • Observe expression of selected marker genes ◦ Heat maps of pseudobulked data ▪ Summation of counts from nuclei from one donor + cell type ◦ Violin plots by cell type Marker Genes shared by sn & bulk The Ideal Heatmap snRNAseq data, Pseudobulked by cell type + donor 21 Stephanie C Hicks
  17. Exploring Marker Expression Where does this noise come from? •

    Outliers in one of more non-target cell type ◦ Here OPCs are expressing MBP 23
  18. Our Solution: Mean Ratio Target Highest non-target Mean Expression target

    cell type Mean Expression highest non-target cell type = Mean Ratio Higher mean ratio: • the more specific that gene is to the target cell type • the better a marker gene it is 24
  19. Mean Ratio vs. Fold Change • Genes with high mean

    ratio also have high fold changes • Not all genes with high fold changes have high mean ratios • Selecting marker genes by mean ratio helps avoid “noisy” genes 25
  20. How Many Markers? 28 • As many look like outliers

    in the “worst” cell type ◦ Least amount of signal ◦ Balance overfitting vs. adding noise ◦ Looking at Inhib: we chose 25 markers • Same number for each cell type
  21. How Many Markers? • This becomes more difficult with more

    specific cell types • We are looking for genes with big differences between cell types 29
  22. Current LIBD Pipeline • Method: Bisque • Reference Data ◦

    Pan-brain (Tran, Maynard et al., Neuron, 2021) ◦ Broad cell types • Marker genes ◦ Top 25 ranked with mean ratio (150 total) 33
  23. Application in Differential Expression Analysis • High correlation between gene

    t-stats for models with and without deconvolution terms • Many of the significant genes stay significant • Deconvolution models are more exclusive • Which model would you choose? 35 ~Dx * BrainRegion + Age + Sex + snpPC + qc metrics + qSVS ~Dx * BrainRegion + Age + Sex + snpPC + qc metrics + qSVS + proportions
  24. Validation Strategies How do we know we are right? •

    Region Trends - does it make sense? • RNAscope - use cell type markers to check composition of tissue 36
  25. Region Trends • Expect different patterns of composition across brain

    regions • Ruzicka et al, bioRxiv, 2021 (DOI: 10.1101/2021.01.21.426000) ◦ Perform deconvolution on 3k bulk RNAseq samples from 15 regions ▪ GTEx, MAYO, ROSMAP data ▪ SPLITR method ▪ 48 donor reference scRNA-seq - 10X ▪ Method and reference data are not available ◦ Validate method using region composition 37
  26. RNAScope • Multiplex single-molecule fluorescent in situ hybridization (smFISH) •

    Visualize cell type specific markers in tissue • What we can observe: ◦ Cell type proportions in the tissue ◦ Individual cell sizes ◦ Total RNA content in different cell types using “total RNA expression genes” Maynard, et al, Nucleic Acids Research, 2020 Fig. 5 Future Work Kristen Maynard 38 Neurons Excit Inhib Oligo
  27. Considering variation in Cell Size & Transcription • Sosina et

    al, bioRxiv, 2020 : Is deconvolution predicting the amount of RNA from a cell type, or the cellular fraction? ◦ RNA fraction vs. Cellular fraction ◦ Neurons are more transcriptionally active: more RNA ◦ Cell size are different across cell types • Most current methods don't account for cell size • Future work! 40
  28. Benchmark Experiment • Linked Bulk, snRNA-seq, and RNAscope experiment •

    Check deconvolution prediction accuracy with RNAScope orthogonal measurement • Impact of cellular fraction in bulk tissue ◦ Is snRNA-seq good enough? • Marker gene selection and more! 42
  29. Resources • DeconvoBuddies ◦ R Package with tools for marker

    finding & plotting ◦ github.com/LieberInstitute/DeconvoBuddies • Coming Soon: Deconvolution code tutorial + video • DLPFC snRNA-seq data available through spatialLIBD 43
  30. Acknowledgements Leonardo Collado-Torres Kristen Maynard Stephanie C Hicks 44 Kelsey

    Montgomery Sang Ho Kwon Sean Maden Nick Eagles Thank you! Any Questions? Sophia Cinquemani Download these slides: speakerdeck.com/lahuuki @lahuuki