Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Team lcolladotor Journal Club April 2025

Avatar for Manisha Barse Manisha Barse
April 23, 2025
14

Team lcolladotor Journal Club April 2025

Team lcolladotor Journal Club: Accurate sample deconvolution of pooled snRNA-seq using sex-dependent gene expression patterns
doi: https://doi.org/10.1101/2024.11.29.626066
Presented By: Manisha Barse
Date: April 23, 2025
Recording link to our journal club meeting: https://youtu.be/zpeCFwYfV3o

This presentation explores a machine learning approach to deconvolve pooled sn RNA-seq data by predicting cell sex, enabling cost-effective demultiplexing without experimental labels.

Avatar for Manisha Barse

Manisha Barse

April 23, 2025
Tweet

Transcript

  1. Accurate sample deconvolution of pooled snRNA-seq using sex-dependent gene expression

    patterns Twa et.al (2024) doi: https://doi.org/10.1101/2024.11.29.626066 Team lcolladotor Journal Club Presented By: Manisha Barse April 23, 2025
  2. Background • Single nucleus RNA sequencing (snRNA-seq): allows profiling of

    cell-type-specific gene expression. • High sequencing costs and limited sample material often necessitate pooling nuclei from multiple samples. • Deconvolution Need: Post-sequencing, it's essential to reassign each nucleus to its original sample after pooling- maintain biological interpretability. • Existing Methods: ◦ Barcode Hashing: Requires additional sample preparation. ◦ Genotype-Based Demultiplexing: Needs prior genotyping data. • Proposed Solution: Utilize inherent sex-dependent gene expression patterns for demultiplexing without extra preprocessing.
  3. Introduction • Key idea: Sex-specific gene expression is detectable in

    snRNA-seq data. • Approach: ◦ Pool one male and one female per sample. ◦ Use ML models to classify cell sex based on gene expression. ◦ No extra experimental labels or sample-specific barcoding. • Thus, reducing cost and complexity.
  4. Methods Overview Used a previously published snRNA-seq dataset from the

    rat ventral tegmental area (VTA), consisting of over 22,000 cells. The workflow includes: • Preprocessing the data and annotating cell types. • Using Boruta, a feature selection algorithm, to identify sex-informative genes. • Training machine learning classifiers (Random Forest, SVM, MLP, Logistic Regression) using these genes. • Evaluating the classifiers on a held-out test dataset.
  5. Boruta Feature selection algorithm Identifies genes with significant importance for

    cell sex classification within the VTA training partition. Adds "shadow" features (shuffled copies of the original features).
  6. Sex predictions = probabilities (0 → male, 1 → female)

    Thresholding low-confidence predictions improves accuracy: Narrow: exclude [0.4–0.6] → 94% accuracy (loses 5% of cells) Wide: exclude [0.25–0.75] → 96% accuracy (loses ~13%) Low-confidence cells = often female + non-neuronal
  7. Increased RNA count per cell increases probability of correct classification

    for all models. Logistic regression of ventral tegmental area test partition cell UMI (unique molecular index) count and model classification (correct: 1, incorrect: 0) Dashed red lines indicate the RNA count corresponding to a 95% probability of correct classification by a model.
  8. Neuronal vs non Neuronal 55/70 neuronal and 232/364 non-neuronal DEG

    classification in non-neuronal cells is limited by the quality of the signal in their transcriptome.
  9. Conclusion • Results demonstrate that feature selection and training strategies

    did not overfit models to VTA dataset. • Models are generalizable to cell sex classification of a distinct dataset from a different brain region. • Requires no additional experimental labeling or preprocessing. • Currently limited to binary sex classification. May need adaptation for use in more complex or heterogeneous tissues.