Tensor decomposition based unsupervised feature extraction with optimized standard deviation applied to identification of differential gene expression, DNA methylation and histone modification
applied to identification of differential gene expression, DNA methylation and histone modification Y-h. Taguci, Departement of Physics, Chuo University, Tokyo, Japan The contents were published in the following paper and two preprints in bioRxiv. https://rdcu.be/c0WE8 (Sci. Rep.) https://doi.org/10.1101/2022.04.02.486807 https://doi.org/10.1101/2022.04.29.490081 google slide ISAIC2022 1
decomposition (TD) based- unsupervised feature extraction (FE) applied to identification of differentially expressed genes (DEGs) can outperform various state of art methods including DESeq2, when standard deviations (SDs) used to generate the null hypothesis (Gaussian distribution of principal components) are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well. ISAIC2022 2
Apply PCA to matrices (e.g., genes ⨉ samples) or TD to tensors (e.g. genes ⨉ samples ⨉ tissues) and get vectors attributed separately to genes, samples, or tissues. 2. Select the vectors of interest, attributed to samples and tissues. 3. Select genes whose contribution to corresponding vectors attributed to genes are larger (based upon the null hypothesis of Gaussian distribution of components of vectors). ISAIC2022 3
for various problems, they have some problems. 1. Histogram of 1-P does not fully obey the null hypothesis 2. Too small genes are selected to think that there are no false negatives. Frequency 0 1 1-P DEG ISAIC2022 5
expression of ith gene at jth sample Samples: seven Universal Human Reference RNA (UHRR) vs seven Human Brain Reference RNA (HBRR) Measured for 40933 genes (done by the presenter) (*) https://www.fda.gov/science-research/bioinformatics-to ols/microarraysequencing-quality-control-maqcseqc ISAIC2022 7
σ h n 0 h n h n n n Select genes with adjusted P i <0.1 Cumulative χ2 distribution Histogram 1-P i , h n of nth bin Adjusted P(n 0 )=0.1 Left: ISAIC2022 12
(TD) based- unsupervised feature extraction (FE) applied to identification of differentially expressed genes (DEGs) can outperform various state of art methods including DESeq2, when standard deviations (SDs) used to generate the null hypothesis (Gaussian distribution of principal components) are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well. ISAIC2022 29