sets using tensor decomposition based unsupervised feature extraction Y-h. Taguchi Department of Physics, Chuo University, Tokyo Japan The contents of this poster was published in: Taguchi, Y.-h.; Turki, T. Tensor-Decomposition-Based Unsupervised Feature Extraction in Single-Cell Multiomics Data Analysis. Genes 2021, 12, 1442. doi:10.3390/genes12091442
is difficult because….. 1. The number of features in individual omics differ (gene~104, DNA methylation/accessibility~108) 2. Full of missing values The percentages of missing values: 70% for gene expression >90% for DNA methylation and accessibility Careful pre-processing is usually required….
l3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) Extension to tensor….. N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k
Since individual omics data is associated with distinct features. N k : number of features of individual omics data M: number of single cells K: number of omics (K=3, in the present study) x ijk ∈ℝN k ×M ×K
from GEO ID GSE154762, which is denoted as Dataset 1 in this study, is composed of 899 single cells for which gene expression, DNA methylation, and DNA accessibility were measured. These single cells represent human oocyte maturation. Dataset 2 Dataset 2 The multiomics dataset retrieved from GEO ID GSE154762, which is denoted as Dataset 2 in this study, is composed of 852 single cells for which DNA methylation and DNA accessibility were measured, as well as 758 single cells for which gene expression was measured. These single cells represent the four time points of the mouse embryo.
Categorical regression: δ js =1 when j ∈ sth category, otherwise 0. a l2 ,b l2s : regression coefficients. Dataset 1 & 2: 18 l 2 s are associated with corrected P-values less than 0.05. u l 2 j =a l 2 +∑ s=1 S b l 2 s δjs
l 2 s are restricted to the selected 18 l 2 s, since such l 1 should be associated with classifications. For dataset 1 and 2, l 1 =1 has the largest value. ∑ l 2 ∑ l 3 =1 K |(G (l 1 l 2 l 3 ))|
dataset 1 and 2, respectively, as those associated with adjusted P-values less than 0.01. Enrichment analysis was performed toward these genes in order to validate selected genes biologically.
based on “ENCODE Histone Modifications 2015”; H3K36m3 is known to play critical roles during oocyte maturation [12]. Forty-seven genes were also targeted by MYC based on “ENCODE and ChEA Consensus TFs from ChIP-X”; Myc is known to play critical roles in oogenesis [13]. Forty-seven genes were also targeted by TAF7 based on “ENCODE and ChEA Consensus TFs from ChIP-X” and “ENCODE TF ChIP- seq 2015”; TAF7 is known to play critical roles during oocyte growth [14]. Forty-seven genes were also targeted by ATF2 based on “ENCODE and ChEA Consensus TFs from ChIP-X”; the expression of ATF2 is known to be altered during oocyte development [15].
by H3K36me3 based on “ENCODE Histone Modifications 2015”; H3K36m3 is known to play critical roles during gastrulation [17]. One-hundred and seventy-five genes were also targeted by MYC based on “ENCODE and ChEA Consensus TFs from ChIP-X”; Myc is also known to play critical roles in gastrulation [18]. One hundred and seventy-five genes were also targeted by TAF7 based on “ENCODE and ChEA Consensus TFs from ChIP-X” and “ENCODE TF ChIP-seq 2015”; TAF7 is known to play critical roles during gastrulation [19]. One- hundred and seventy-five genes were also targeted by ATF2 based on “ENCODE and ChEA Consensus TFs from ChIP-X”; the expression of ATF2 is known to be maintained during gastrulation [20].