Data-driven Identification of Total RNA Expression Genes (TREGs) for
Estimation of RNA Abundance in Heterogeneous Cell Types
L.A. Huuki-Myers1, K.D. Montgomery1, S.H. Kwon1,2, S.C. Page1, S.C. Hicks3, K.R. Maynard1,4*, L. Collado-Torres1,5*
1. Lieber Institute for Brain Development, 2. Department of Neuroscience Johns Hopkins School of Medicine, 3. Department of Biostatistics Johns Hopkins Bloomberg
School of Public Health, 4. Department of Psychiatry and Behavioral Sciences JHSOM, 5. Center for Computational Biology Johns Hopkins University
BAPTA
Abstract
Conclusion
Acknowledgements
Validation of TREGs with smFISH +
RNAscope
Presenter &
Poster requests:
[email protected]
@lahuuki Download this Poster:
BioRxiv Pre-print
TODO
Next-generation sequencing technologies have facilitated data-driven
identification of gene sets with different features including genes with
stable expression, cell-type specific expression, or spatially variable
expression. Here, we aimed to define and identify a new class of
"control" genes called) Total RNA Expression Genes (TREGs), which
correlate with total RNA abundance in heterogeneous cell types of
different sizes and transcriptional activity. We provide a data-driven
method to identify TREGs from single cell RNA-sequencing (RNA-seq)
data, available as an R/Bioconductor package. We demonstrated the
utility of our method in the postmortem human brain using multiplex
single molecule fluorescent in situ hybridization (smFISH) and
compared candidate TREGs against classic housekeeping genes. We
identified AKT3 as a top TREG across five brain regions, especially in the
dorsolateral prefrontal cortex.
Rank Invariance Calculation
Software tools available as R/BioC package:
bioconductor.org/packages/TREG
i. Filter for low expressed genes
ii. Compute Expression Rank of each
nucleus for each gene
iii. Calculate mean gene expression across
all nuclei for one cell type and then its
Rank Expression.
iv. Per gene, find the difference of the
Rank Expression against the mean Rank
Expression for each nucleus in a given
cell type.
v. Calculate the mean of the absolute
Expression Rank differences for each
gene.
vi. Rank the mean absolute Expression
Rank differences.
vii. Repeat steps ii-vi for each cell type.
viii. Per gene, compute the sum of the
previous ranks across all cell types, and
then rank these sums across genes
such that the highest rank is given to
the gene with the smallest sum. This is
the final Rank Invariance value.
Gene Filtering & TREG Selection
c
i,j,k,z is the number of snRNA-seq
counts for nucleus z for gene i, cell
type j, and brain region k, and nj,k is
the number of nuclei for cell type j
and brain region k
1. Filter to top 50% expressed genes
2. Calculate Proportion Zero for each gene across each Brain Region
& cell type
3. Filter to genes with a maximum Proportion Zero across groups <
0.75
Properties of candidate TREGs in snRNA-seq data
• Select AKT3, ARID1B, and MALAT1 as candidate TREGs from top RI
values
• Observed highly rank invariant expression vs. Housekeeping gene
POLR2A
• Candidate TREGs show less variable Expression rank than HK POLR2A
• Observed strong linear relationship between TREG expression and
total nuclear RNA (estimated by the log2 sum of all counts) within
each cell type
• Three RNAscope probe combinations (TREG or POLR2A + cell type
markers) used to test the performance of the genes
• TREG puncta are observed in 86% or more nuclei and in dynamic
ranges
• AKT3 expression in higher in transcriptionaly activity gray matter than
white matter
• MALAT1 reads are unreliable
• Rank Invariance is an effective way to find TREGs in
sn/scRNA-seq data and can be used to identify TREGs
relevant to a specific tissue or experimental setting
• AKT3 is an effect TREG in the human brain specifically the
DLPFC
• TREGs represent an important class of genes that could be
used for a variety of assays and downstream analyses
Puncta vs. Total RNA Expression
Huuki-Myers et al., BioRxiv, 2022, 10.1101/2022.04.28.489923
Tran, Maynard et al., Neuron, 2021, 10.1101/2020.10.07.329839
Indica labs, HALO 3.3 FISH-IF
TREG paper git repo 10.5281/zendo.6502303
Gene
Prop. non-
zero in
DLPFC
snRNA
Mean
prop.
non-zero
Mean n
puncta
AKT3 0.92 0.88 4.09
ARID1B 0.94 0.86 3.08
MALAT1 1.00 0.98 2.07
POLR2A 0.30 0.78 2.75
Gene β sd Std.
AKT3 -5.52 5.18 -1.07
ARID1B -2.63 3.42 -0.77
MALAT1 -1.22 1.53 -0.8
POLR2A -3.49 3.34 -1.05
All genes in
snRNA-seq -21844.07 15560.76 -1.33
• AKT3 has most similar slope to
total expression measured by
snRNA-seq over observable cell
types
• RNA scope + TREG allows the
comparison of nuclear size and
total RNA expression
Experimental Design
Proportion Zero Equation
Louise Huuki-Myers Kelsey D. Montgomery Sang Ho Kwon Stephanie C. Page Stephanie C. Hicks Kristen R. Maynard Leonardo Collado-Torres