Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Deconvolution - Seminar at UCL ICH

Introduction to Deconvolution - Seminar at UCL ICH

An introduction to our work on deconvolution in at LIBD. Presented as a seminar to the Developmental Biology & Cancer group at the University College London Great Ormond Street Institute of Child Health on May 22, 2023.

Louise Huuki-Myers

May 22, 2023
Tweet

More Decks by Louise Huuki-Myers

Other Decks in Science

Transcript

  1. Introduction to
    Cell Type
    Deconvolution
    Louise Huuki-Myers
    Staff Scientist
    1
    @lahuuki
    lahuuki.github.io
    Download these slides: speakerdeck.com/lahuuki
    1

    View full-size slide

  2. About Lieber Institute for
    Brain Development
    ● Non-profit Research Institute in Baltimore, MD
    ● Study the genetics of neuropsychiatric disorders 🧬
    ● 139 multidisciplinary scientists
    ● Affiliated with the Johns Hopkins Medical School
    2
    Baltimore
    Maryland 🔸

    View full-size slide

  3. Our R/Bioconductor Powered Data Science Team
    ● Led by Leonardo Collado-Torres
    ● Computational lab specializing in:
    ○ RNA seq analysis
    ■ Bulk, single cell, spatial
    ○ Open Source software development
    ○ Knowledge sharing
    ■ Data Science Guidance Sessions
    ■ Rstat Club: Videos available
    www.youtube.com/@lcolladotor
    ● Team website
    ○ lcolladotor.github.io/
    3

    View full-size slide

  4. About Me
    ● Staff Scientist at LIBD
    ○ Joined in 2020
    ○ Working on Bulk RNA-seq, single cell RNA-seq, spatial transcriptomics
    ● Masters in Bioinformatics from Temple University
    Philadelphia, PA
    ○ Previously worked on evolutionary time trees
    ● Other interest:
    ○ running, rowing, baking
    4
    @lahuuki

    View full-size slide

  5. Studying Gene Expression
    in the Human Brain
    Bulk RNA-seq
    Single nucleus
    RNA-seq
    5

    View full-size slide

  6. Background: Cell Types in the Brain
    ● The brain is made of complex tissues consisting
    of different types of cells
    ● Some Dx associated with changes in cell type
    specific expression
    ○ Ex. Pitt-Hopkins syndrome and oligodendrocytes (Phan
    et al, Nature Neuroscience, 2020)
    6

    View full-size slide

  7. What is Deconvolution?
    Tissue
    Bulk RNA-seq
    snRNA-seq
    Estimated proportions
    7
    Deconvolution
    $$$
    $
    Free!

    View full-size slide

  8. What is Deconvolution?
    ● Inferring the composition of
    different cell types in a bulk
    RNA-seq data
    ● Utilize single cell data to
    obtain cell type gene
    expression profiles
    8

    View full-size slide

  9. Why is Deconvolution Important?
    ● Tissue is heterogeneous
    ○ Different cell types express genes at different levels
    ● Samples can differ in cell type composition due to biology or dissection
    ○ Check for differences in case vs. control
    ● Controlling for cell fractions between samples can make case vs. control
    analysis cleaner
    ○ Quality control
    ○ Confounding factor in differential expression analysis - prevents false-positives and
    false-negatives
    9

    View full-size slide

  10. How do you run deconvolution?
    10
    deconvolution(Y, Z) = Proportion of Cell Types
    Gene Expression
    Bulk RNA-seq Sample
    Gene Expression
    scRNA-seq cell type
    Populations
    Computational Algorithm
    Bulk Samples
    Proportion

    View full-size slide

  11. Methods
    11
    deconvolution(Y, Z) = Proportion of Cell Types

    View full-size slide

  12. Method Summary
    Method Regression
    Correction for
    Technical
    Variation
    Other Features
    MuSiC
    Wang et al, Nature
    Communications, 2019
    W-NNLS regression
    (Weighted -
    Non-negative least
    squares)
    None
    Tree guided deconvolution,
    good for closely related cell
    types
    Bisque
    Jew et al, Nature
    Communications, 2020
    NNLS regresion
    Gene specific
    transformation of
    bulk data
    Leverage overlapping bulk &
    sc data
    SCDC
    Dong et al, Briefings in
    Bioinformatics, 2020
    W-NNLS framework
    proposed by MuSiC
    Option for Gene
    specific
    transformation of
    bulk data (from
    Bisque)
    Multiple reference datasets
    can be used, results
    combined with ENSEMBL
    weights
    DWLS
    Tsoucas, Nature
    Communications, 2019
    Dampened Weighted
    least squares
    None
    12

    View full-size slide

  13. Which Method is the Most Accurate?
    ● Benchmarking shows that different methods perform best on
    different data sets (Cobos et al, Nature Communications, 2020)
    ● Benchmarking results from different papers on “real” data
    ○ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT
    ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a)
    ○ Bisque paper: Bisque > MuSiC > CIBERSORT
    ■ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category
    (Fig 4)
    ○ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT
    ■ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b)
    ○ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq
    ■ Human PMBC flow sorted (Fig 7)
    13

    View full-size slide

  14. Why we like Bisque
    ● Benchmarked with a DLPFC dataset
    ● Robust to marker set
    ● Robust to library prep
    ● More reasonable estimates on GTEx dataset
    Stay Tuned: Methods benchmark in the works!
    14

    View full-size slide

  15. Reference Single Cell
    Data
    15
    deconvolution(Y, Z) = Proportion of Cell Types

    View full-size slide

  16. Important Factors
    ● Number and diversity of donors (4+)
    ● Resolution of cell types
    ● Does it match the bulk data?
    ○ Same tissue or region? Same experimental conditions
    ● Same cellular fraction?
    ○ Brain tissue is often limited to single nucleus
    16

    View full-size slide

  17. Single Nucleus RNA-seq
    References
    Tran, Maynard et al, Neuron, 2021
    10.1016/j.neuron.2021.09.001
    ● 5 Brain Regions + 8 Donors
    ○ Amygdala, sACC, Hippocampus, NAc, DLPFC
    ● Utilize “pan brain” annotation to maximize
    donors
    Matthew N Tran
    17
    Kelsey Montgomery

    View full-size slide

  18. Huuki-Myers et al, bioRxiv, 2023 10.1101/2023.02.15.528722
    ● DLPFC + 10 Donors (n=19)
    ● Layer level cell type annotation
    ● Access with SpatialLIBD
    18
    Single Nucleus RNA-seq
    References

    View full-size slide

  19. Marker Finding
    19
    deconvolution(Y, Z) = Proportion of Cell Types

    View full-size slide

  20. What are Marker Genes?
    ● “Define” cell types
    ○ Differentially expressed between cell types
    ● Historically
    ○ Know markers associated with key cell types
    ○ Ex. MBP: major constituent of the myelin sheath, marker for oligodendrocytes
    ● What does the Data tell us?
    ○ Human vs. model organisms
    ○ Regional
    ○ Technical differences
    20

    View full-size slide

  21. Marker Gene Selection
    ● Filter for genes expressed in snRNA-seq and
    bulk data
    ● Looking for genes expressed in only one cell
    type
    ○ Test for specificity of each gene for each cell
    type
    ● Observe expression of selected marker genes
    ○ Heat maps of pseudobulked data
    ■ Summation of counts from nuclei from
    one donor + cell type
    ○ Violin plots by cell type
    Marker Genes shared by sn & bulk
    The Ideal Heatmap
    snRNAseq data,
    Pseudobulked by cell type + donor
    21
    Stephanie C Hicks

    View full-size slide

  22. Exploring Marker Expression
    ● T-test between two groups
    ● Fold change between groups
    22

    View full-size slide

  23. Exploring Marker Expression
    Where does this noise come from?
    ● Outliers in one of more non-target
    cell type
    ○ Here OPCs are expressing MBP
    23

    View full-size slide

  24. Our Solution: Mean Ratio
    Target
    Highest non-target
    Mean Expression
    target cell type
    Mean Expression
    highest non-target
    cell type
    = Mean Ratio
    Higher mean ratio:
    ● the more specific that gene is to
    the target cell type
    ● the better a marker gene it is
    24

    View full-size slide

  25. Mean Ratio vs. Fold Change
    ● Genes with high mean ratio also have high
    fold changes
    ● Not all genes with high fold changes have
    high mean ratios
    ● Selecting marker genes by mean ratio
    helps avoid “noisy” genes
    25

    View full-size slide

  26. 1vAll Markers vs. Mean Ratio Markers
    26

    View full-size slide

  27. 1vAll Markers vs. Mean Ratio Markers
    27

    View full-size slide

  28. How Many Markers?
    28
    ● As many look like outliers in the “worst”
    cell type
    ○ Least amount of signal
    ○ Balance overfitting vs. adding noise
    ○ Looking at Inhib: we chose 25 markers
    ● Same number for each cell type

    View full-size slide

  29. How Many Markers?
    ● This becomes more difficult with more
    specific cell types
    ● We are looking for genes with big
    differences between cell types
    29

    View full-size slide

  30. Tran, Maynard, et al.
    Top 25 Markers
    30

    View full-size slide

  31. Huuki-Myers, et al.
    Top 25 Markers
    31
    * Only plotted 10/25 genes in this heatmap

    View full-size slide

  32. Results + Validation
    32
    deconvolution(Y, Z) = Proportion of Cell Types

    View full-size slide

  33. Current LIBD Pipeline
    ● Method: Bisque
    ● Reference Data
    ○ Pan-brain (Tran, Maynard et al., Neuron, 2021)
    ○ Broad cell types
    ● Marker genes
    ○ Top 25 ranked with mean ratio (150 total)
    33

    View full-size slide

  34. 34
    MDDSeq Data

    View full-size slide

  35. Application in Differential Expression Analysis
    ● High correlation
    between gene t-stats
    for models with and
    without deconvolution
    terms
    ● Many of the significant
    genes stay significant
    ● Deconvolution models
    are more exclusive
    ● Which model would you
    choose?
    35
    ~Dx * BrainRegion + Age + Sex + snpPC + qc metrics + qSVS
    ~Dx * BrainRegion + Age + Sex + snpPC + qc metrics + qSVS + proportions

    View full-size slide

  36. Validation Strategies
    How do we know we are right?
    ● Region Trends - does it make sense?
    ● RNAscope - use cell type markers to check composition of tissue
    36

    View full-size slide

  37. Region Trends
    ● Expect different patterns of composition
    across brain regions
    ● Ruzicka et al, bioRxiv, 2021 (DOI:
    10.1101/2021.01.21.426000)
    ○ Perform deconvolution on 3k bulk RNAseq
    samples from 15 regions
    ■ GTEx, MAYO, ROSMAP data
    ■ SPLITR method
    ■ 48 donor reference scRNA-seq - 10X
    ■ Method and reference data are not
    available
    ○ Validate method using region composition
    37

    View full-size slide

  38. RNAScope
    ● Multiplex single-molecule
    fluorescent in situ hybridization
    (smFISH)
    ● Visualize cell type specific
    markers in tissue
    ● What we can observe:
    ○ Cell type proportions in the tissue
    ○ Individual cell sizes
    ○ Total RNA content in different cell
    types using “total RNA
    expression genes” Maynard, et al, Nucleic Acids
    Research, 2020
    Fig. 5
    Future Work
    Kristen Maynard
    38
    Neurons Excit Inhib Oligo

    View full-size slide

  39. Looking Ahead
    39

    View full-size slide

  40. Considering variation in Cell Size & Transcription
    ● Sosina et al, bioRxiv, 2020 : Is deconvolution predicting the amount of RNA from a cell type, or
    the cellular fraction?
    ○ RNA fraction vs. Cellular fraction
    ○ Neurons are more transcriptionally active: more RNA
    ○ Cell size are different across cell types
    ● Most current methods don't account for cell size
    ● Future work!
    40

    View full-size slide

  41. New Commentary Preprint
    on arXive!
    41
    Sean Maden
    https://doi.org/10.48550/arXiv.2305.06501

    View full-size slide

  42. Benchmark Experiment
    ● Linked Bulk, snRNA-seq, and RNAscope
    experiment
    ● Check deconvolution prediction accuracy
    with RNAScope orthogonal measurement
    ● Impact of cellular fraction in bulk tissue
    ○ Is snRNA-seq good enough?
    ● Marker gene selection and more!
    42

    View full-size slide

  43. Resources
    ● DeconvoBuddies
    ○ R Package with tools for marker finding & plotting
    ○ github.com/LieberInstitute/DeconvoBuddies
    ● Coming Soon: Deconvolution code tutorial + video
    ● DLPFC snRNA-seq data available through spatialLIBD
    43

    View full-size slide

  44. Acknowledgements
    Leonardo Collado-Torres Kristen Maynard Stephanie C Hicks
    44
    Kelsey Montgomery
    Sang Ho Kwon Sean Maden
    Nick Eagles
    Thank you!
    Any Questions?
    Sophia Cinquemani
    Download these slides: speakerdeck.com/lahuuki @lahuuki

    View full-size slide