= Human j Human j Genes i Tissues k N M × Genes i Human j = × M Tissues k By decomposition to vectors, we can get “genes” “Humans” “Tissues” vectors and can get “meaning”. Actually, we need not a vector but a set of vectors. What is tensor What is tensor? ?:extension of matrix Matrix:genes i× human j(patients vs healthy control):xij Tensor:genes i× human j(patients vs healthy control)× tissues k: xijk What is PCA and TD? What is PCA and TD? Decompose matrix and tensor into vectors
In contrast to usual usage of PCA, not samples but features are embedded into Q dimensional space. PCA PC1 samples PC Loadings M samples N × M Matrix X (numerical values) PC2 PC1 PC Score + + + + + + + + + + + + + + + No distinction between classes
N(0,1/2) N(m,1/2) [N(m,1/2)+N(0,1/2)]/2 +:Top 10 outliers m=2 Thus, extracting outliers selects features distinct between two classes in an unsupervised way. Accuracy:(100 trials) Accuracy:(100 trials) 89.5% (m=2) 52.6% (m=1) PC1 PC2 Normal μ:mean Distribution ½ :SD
GSE76381 ScRNA-seq of human and mouse mid brain developments i:Genes j,k:cells Cell nubmers and time points Human: 6w:287cells、7w:131cells、8w:331cells、9w:322cells、10w: 509cells、11w:397cells, in total, 1977cells (w:week) Mouse:E11.5:349cells、E12.5:350cells、E13.5:345cells、E14.5: 308cells、E15.5:356cells、E18.5:142cells、unknown:57cells, in total, 1907cells.
(standardized) PC scores uli are attributed to genes, PC loading, vlj, are attributed to samples by PCA (it differs from usual usage)。 ulis are assumed to obey multiple Gaussian P i =P χ2 [ >∑ l=1 L ( u li σl ) 2 ] Pi: corrected by Benjamini-Hochberg Genes with corrected Pi < 0.01are selected. cf. 演題番号O-17 遺伝子選択のためののためのFDRカットオフ水準検討水準検討 藤澤孝太、宮田龍太 Gene selection Gene selection 63 65 53 53 Human L=2 Mouse L=3 Genes
cells Tensor is generated from product of cells xijk = xij × xik ∈ ℝ13384×1977×1907 Size reduction needed because of too huge tansors xjk: is singular value decomposed vlj: lth human cell sigular value vectors vlk: lth mouse cell sigular value vectors vlj and vlk with any kind of time dependence are selected with categorical regression(ANOVA) v lj =a l +∑ t b lt δjt v lk =a l ' +∑ t b lt ' δkt δjt,δkt: 1 when cells j,k is measured at t otherwise 0 i:Genes j,k:Cells x jk =∑ i x ijk
32 32 Human mouse uli are generated from vlj and vlk u li ( j)=∑ j v lj x ij u li (k)=∑ k v lk x ik lth human gene singular value vectors lth mouse gene singular value vectors P-values are attributed to gene singular value vectors by χ2 distribution, corrected by BH criterion, genes with corrcted P <0.01 are selected.
using PCA and TD. Because of lack or small labels, this is fitted to scRNA-seq I have published a monograph from Springer. I am happy if you can but it, although it is extremely expensive.