Team lcolladotor Journal Club April 2025

Accurate sample deconvolution of pooled snRNA-seq using sex-dependent gene expression
patterns Twa et.al (2024) doi: https://doi.org/10.1101/2024.11.29.626066 Team lcolladotor Journal Club Presented By: Manisha Barse April 23, 2025

Background • Single nucleus RNA sequencing (snRNA-seq): allows profiling of
cell-type-specific gene expression. • High sequencing costs and limited sample material often necessitate pooling nuclei from multiple samples. • Deconvolution Need: Post-sequencing, it's essential to reassign each nucleus to its original sample after pooling- maintain biological interpretability. • Existing Methods: ◦ Barcode Hashing: Requires additional sample preparation. ◦ Genotype-Based Demultiplexing: Needs prior genotyping data. • Proposed Solution: Utilize inherent sex-dependent gene expression patterns for demultiplexing without extra preprocessing.

Introduction • Key idea: Sex-specific gene expression is detectable in
snRNA-seq data. • Approach: ◦ Pool one male and one female per sample. ◦ Use ML models to classify cell sex based on gene expression. ◦ No extra experimental labels or sample-specific barcoding. • Thus, reducing cost and complexity.

Methods Overview Used a previously published snRNA-seq dataset from the
rat ventral tegmental area (VTA), consisting of over 22,000 cells. The workflow includes: • Preprocessing the data and annotating cell types. • Using Boruta, a feature selection algorithm, to identify sex-informative genes. • Training machine learning classifiers (Random Forest, SVM, MLP, Logistic Regression) using these genes. • Evaluating the classifiers on a held-out test dataset.

Model Training Workflow

Boruta Feature selection algorithm Identifies genes with significant importance for
cell sex classification within the VTA training partition. Adds "shadow" features (shuffled copies of the original features).

Feature Selection and Model Performance Uty Xist

Interpreting Model Features

Sex predictions = probabilities (0 → male, 1 → female)
Thresholding low-confidence predictions improves accuracy: Narrow: exclude [0.4–0.6] → 94% accuracy (loses 5% of cells) Wide: exclude [0.25–0.75] → 96% accuracy (loses ~13%) Low-confidence cells = often female + non-neuronal

Increased RNA count per cell increases probability of correct classification
for all models. Logistic regression of ventral tegmental area test partition cell UMI (unique molecular index) count and model classification (correct: 1, incorrect: 0) Dashed red lines indicate the RNA count corresponding to a 95% probability of correct classification by a model.

Neuronal vs non Neuronal 55/70 neuronal and 232/364 non-neuronal DEG
classification in non-neuronal cells is limited by the quality of the signal in their transcriptome.

Application to NAc dataset 8 males and 8 female rats
39,254 cells, 16 cell types

Conclusion • Results demonstrate that feature selection and training strategies
did not overfit models to VTA dataset. • Models are generalizable to cell sex classification of a distinct dataset from a different brain region. • Requires no additional experimental labeling or preprocessing. • Currently limited to binary sex classification. May need adaptation for use in more complex or heterogeneous tissues.

Thank You!

Team lcolladotor Journal Club April 2025

Team lcolladotor Journal Club April 2025

Manisha Barse

More Decks by Manisha Barse

Featured

Transcript

Accurate sample deconvolution of pooled snRNA-seq using sex-dependent gene expression

Background • Single nucleus RNA sequencing (snRNA-seq): allows profiling of

Introduction • Key idea: Sex-specific gene expression is detectable in

Methods Overview Used a previously published snRNA-seq dataset from the

Model Training Workflow

Boruta Feature selection algorithm Identifies genes with significant importance for

Feature Selection and Model Performance Uty Xist

Interpreting Model Features

Sex predictions = probabilities (0 → male, 1 → female)

Increased RNA count per cell increases probability of correct classification

Neuronal vs non Neuronal 55/70 neuronal and 232/364 non-neuronal DEG

Application to NAc dataset 8 males and 8 female rats

Conclusion • Results demonstrate that feature selection and training strategies

Thank You!