described in the following my book published by Springer International, at Sep. 2019. I am glad if the audience can buy it and learn how to apply this method to your own research!
l3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) Extension to tensor….. N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k N: number of genes (i) M: number of samples (j) K: number of tissues (k) xijk: gene expression Example
: expression of gene i of sample j xkj: methylaion of region k of sample j x xijk ijk ≡ ≡ x xij ij ⨉ ⨉ x xkj kj G u l1i u l2j u l3k L1 L2 L3 x ijk N M K x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k
Drug Discovery from Gene Expression with Tensor Decomposition Author(s): Y-h. Taguchi*, Turki Turki. Journal Name: Current Pharmaceutical Design Volume 25 , Issue 43 , 2019 OPEN ACCESS Inference of drug efective to Alzheimer disease of mice brain single cell gene expression (without drug treated gene expression)
(APP_NL-F-G and C57Bl/6), Two tissues (Cortex and Hippocampus), Four ages (3, 6, 12, and 21 weeks), Two sexes (male and female) Four 96 well plates (the number of cells). Aim: Understanding Alzheimer’s disease
j 4 j 5 j 6 =∑ l 1 l 2 l 3 l 4 l 5 l 6 l 7 G(l 1 ,l 2 ,l 3 ,l 4 ,l 5 ,l 6 ,l 7 ) ×u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 j 4 u l 5 j 5 u l 6 j 6 u l 7 i (A) u l1j1 :96 wells (cells), l 1 =1 (B) u l2j2 : genotype APP_NL-F-G vs C57Bl/6, l 2 =1 (C) u l3j3 : Cortex vs Hippocampus, l 3 =1 (D) u l4j4 : 3, 6, 12, 21 weeks , l 4 =2 (E) u l5j5 : female vs male, l 5 =1 (F) u l6j6 : 4 plates , l 1 =1 → l 7 =2 with G(1,1,1,2,1,1,l 7 ) (the largest absolute values)
) 2 ] Attributing P-values to genes After correcting P-values by BH criterion, 401 genes with corrected P i <0.01 are selected. → Evaluate how these are overlapped with genes affected by known Alzheimer’s drug treatments. 401 genes are uploaded to Enrichr
apply the other methods, the following two methods did not converge within 24 hours. (The present method converged in 10 hours) The first alternative method: CP decomposition: Orthogonal tensor decomposition. x ijk u l1i u l2j u l3k N M K u l1i u l2j u l3k + + ·······
factorization (CMTF) (supervised TD) v i ,v j ,v k ....: various target vectors, e.g., time dependence, genotype dependence, cell dependence, plate dependence….. + ······· CP decomposition Penalty term
model data v i v j v k v ’i v ’j v’ k + v i ,v j ,v k v’ i ,v’ j ,v’ k v i ,v j ,v k v’ i ,v’ j ,v’ k Model 1 Model 2 3 mode tensor generated by summation of two products of three identical vectors
method applicable to large scale data, Our method is only method applicable to large scale data, since only our method does not require iteration! since only our method does not require iteration!
Data set: GSE76381 scRNA-seq of human and mouse mid brain developments i:Genes j,k:cells Purpose of the analysis: Selection of genes associated with mid brain development commonly between human and mouse
: Tensor is generated from product of cells using 13,384 common from product of cells using 13,384 common genes between human and mouse genes between human and mouse xijk = xij × xik ∈ ℝ13384×1977×1907 i:Genes j,k:Cells Size reduction needed because of too huge tensors xjk: decomposed by singular value decomposition vlj: lth human cell singular value vectors vlk: lth mouse cell singular value vectors x jk =∑ i x ijk
δjt v lk =a l ' +∑ t b lt ' δkt δjt,δkt: 1 when cells j,k is measured at t 0 otherwise vlj and vlk with any kind of time dependence are selected with categorical regression(ANOVA)
li ( j)=∑ j v lj x ij u li (k)=∑ k v lk x ik lth human gene singular value vectors lth mouse gene singular value vectors P-values are attributed to gene singular value vectors by χ2 distribution, corrected by BH criterion, genes associated with adjusted P- values less than 0.01 are selected.
127 44 44 Human Mouse Selected genes Less overlaps between human and mouse. No biological terms related to brains are enriched. More comparisons are available in the following paper. Y-h. Taguchi, ICIC2018 (2018) “Principal Component Analysis-Based Unsupervised Feature Extraction Applied to Single-Cell Gene Expression Analysis” https://doi.org/10.1007/978-3-319-95933-7_90
applicable to massive single cell RNA-seq data and is capable to select biologically reasonable genes. Since it is an unsupervised method, it is easy to use and is applicable to wide range of scRNA-seq data set.