Tensor-Decomposition-based Unsupervised Feature Extraction in Single-cell Multiomics Data Analysis

Tensor-Decomposition-based Unsupervised Feature Extraction in Single-cell Multiomics Data Analysis Y-h.
Taguchi Chuo University, Tokyo, Japan. and Turki Turki King Abdulaziz University, Jeddah, Saudi Arabia Taguchi, Y.-h.; Turki, T. Tensor-Decomposition-Based Unsupervised Feature Extraction in Single-Cell Multiomics Data Analysis. Genes 2021, 12, 1442. https://doi.org/10.3390/genes12091442

Introduction It is difficult to integrate multiomics single cell data
set because 1) The number of features is huge (~108 for site-wise measurements) 2) Full of missing data (only a few percentages of non-missing values for site-wise measurements). 3) Difficult to integrate distinct number of features, mRNA ~ 104.

Conventional approaches: Conventional approaches: Give up integrating multiomics data. (Analyze
individual omics data separately). Screening filled features (i.e. excluding features with missing values) Filling missing values artificially (e.g., using Bayes predictors) The proposed approach: The proposed approach: Integrate multiomics data sets full of missing values as well as associated with distinct number of features without any pre-process (as it is) using tensor decomposition (TD).

GSE154762: Dataset 1 GSE154762: Dataset 1 Number of cells: 899
Gene expression+DNA mathylation+DNA accessibility GSE121708: Dataset 2 Number of cells: 852 (758 for gene expression) Gene expression+DNA mathylation+DNA accessibility

PreProcess Gene expression: nothing DNA methylation: -1:unmethylated, 0:missing values, 1:metylated
DNA accessibility: average over every 200 nucleotide regions. (It is four histone proteins + a linker protein) Standardized: Gene expression: zero mean, variance of 1 DNA methylation and accessibility for data set 1: Mean absolute values is one Those for data set 2: nothing (because of heterogeneity)

For data set 1 or 2: x ijk ∈ℝN k
×M ×3 N k : Number of features of kth omics data: k=1: gene expression, k=2: DNA methylation, k=3: DNA accessibility M:number of cells. Since N k s are not common we need to adjust N k s into one value.

Full of missing values Full of missing values

x ijk =∑ l=1 L λl u lik u l
jk x ljk =∑ i=1 N k x ijk u lik ∈ℝL× M×K Apply TD to x ljk to get where we emply L=10 x ljk =∑ l 1 =1 L ∑ l 2 =1 M ∑ l 3 =1 3 G(l 1 l 2 l 3 )u l 1 l u l 2 j u l 3 k

What is tensor decomposition(TD)? Expand tensor as a series of
product of vectors, x ijk l:reduced dimension j:cells k:multiomics G k j l l 1 l 2 l 3 = u l 1 l u l 2 j u l 3 k u l 1 i u l 2 j u l 3 k x ljk ≃∑ l 1 =1 L 1 ∑ l 2 =2 L 2 ∑ l 3 =1 L 3 G (l 1 l 2 l 3 )u l 1 l u l 2 j u l 3 k

Select u l2j associated with classification, s. Data set 1:human
oocyte maturation Classification: Cell types Data set 2:four time points of the mouse embryo Classification: time points a l2s ,b l2 : regression coefficients δ js =1 when j ∈ s, otherwise =0 Check which u l2j is coincident with classes, s. u l 2 j =a l 2 s δjs +b l 2

18 (for data set 1) and 12 (for data set
2) u l2j are significantly correlated with classifications. UMAP was applied to top 30 u l2j and we got two dimensional embedding as can be seen in the following slides.

data set 1

data set 2

We also performed gene selections and biological validation of them
using enrichment analysis. But no time to present them. Conclusions: Conclusions: We have applied TD to integration of single cell multiomics data sets. Without specific preprocessing, TD successfully obtained low dimensional embedding with which UMAP can generate embedding coincident with classification.

Tensor-Decomposition-based Unsupervised Feature...

Tensor-Decomposition-based Unsupervised Feature Extraction in Single-cell Multiomics Data Analysis

Y-h. Taguchi PRO

More Decks by Y-h. Taguchi

Other Decks in Science

Featured

Transcript