Tensor Decomposition-based Unsupervised Feature Extraction for Integrated Analysis of TCGA Data on MicroRNA Expression and Promoter Methylation of Genes in Ovarian Cancer
TCGA Data on MicroRNA Expression and Promoter Methylation of Genes in Ovarian Cancer Y-H. Taguchi Department of Physics, Chuo Univeristy, Tokyo, Japan and Ka-Lok Ng Department of Bioinformatics and Medical Engineering Asia University, Taichung, Taiwan Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan 10.1109/BIBE.2018.00045 https://doi.org/10.1101/380071
of several reasons, including 1. The number of features is often different from each other so much (e.g, number of miRNAs is 103, that of mRNA is104, that of methylation sites is >105). 2. We might require multiple criteria to screen features, e.g, “mRNAs should be distinct between patients and healthy controls”, “miRNA should be as well”, “mRNAs and miRNAs are expected to be correlated negatively”, and so on. This often results in that there are no or very few mRNAs and miRNAs that can pass all of requirements. 3. …..
Y-h. Taguchi, recently proposed “Tensor decomposition (TD) based unsupervised feature extraction (FE)” (Y-h. Taguchi, PloS ONE, 2017). In this study, we applied TD based unsupervised FE to miRNAs miRNAs expression profiles and promoter methylation of protein coding protein coding genes of ovarian cancers taken from TCGA (The cancer gene atlas) and identified miRNAs miRNAs and protein coding protein coding genes such that 1. Promoter methylation of protein coding protein coding genes is distinct between tumors and normal tissues. 2. miRNA miRNA expression is distinct between tumors and normal tissues 3. miRNAs miRNAs expression and protein coding protein coding gene promoter methylation is correlated.
x ijk G x l1i x l2j x l3k Tensor decomposition x ijk =x ij ×x kj ≒ΣΣ l1,l2,l3 G l1,l2,l3 x l1i x l2j x l3k protein coding protein coding gene methylation miRNA expression i: protein coding protein coding gene methylation j: patients vs healthy controls k:miRNA expression TD applied to multi omics datasets TD applied to multi omics datasets
tumors and normal tissues Assume Gaussian for x l1i Detect outliers P i =P[>∑ l1 ( x il 1 σ ) 2 ] Benjamini-Hochberg corrected P <0.01 P-values by χ2 dist P(p) 1-p 0 1 Select x l1i associated with x l2j
TCGA i: 24906 protein coding protein coding genes to which promoter methylation is attributed j: 8 normal vs 569 tumor samples = 577 samples k: 732 miRNAs miRNAs profiles Tesnor: x ijk ∈ ℝ24906⨉577⨉732 → too huge! → approximation (Y-h. Taguchi, PloS ONE, 2017) x ik = ∑ j x ijk ∈ ℝ24906⨉732 → computable x l2j miRNA= ∑ k x l3k x kj x l2j methyl= ∑ i x l1i x ij
coding protein coding genes are distinct between normal tissues and tumors. 1681 pairs = 7 miRNAs miRNAs ⨉ 241 protein coding protein coding genes are highly correlated (P<0.01 after BH correction). Most of pairs (94%) are correlated significantly.
of miRNAs miRNAs and protein coding protein coding genes using t test (normal tissue vs tumors) P<0.01 after BH correction → 214 out of 732 miRNAs miRNAs and 19395 out of 24906 protein coding protein coding genes → too many miRNAs miRNAs and protein coding protein coding genes Correlation between top 214 miRNAs and 19395 protein coding protein coding genes Only 6% pairs are significantly correlated.
of miRNAs miRNAs and protein coding protein coding genes with significantly correlation (P<0.01 after BH correction) and select those distinct between normal tissues and tumors…. Only 10% pairs are significantly correlated. Thus, limited number of pairs are selected succesfully. But….. 608989 positively correlated pairs and 588783 negatively correalted pairs include unfortunately all of miRNAs miRNAs and protein coding protein coding genes genes… → useless for miRNAs miRNAs and protein coding protein coding genes selection…..
based unsupervised FE recently proposed by one of authors, Y-h. Taguchi (Y-h. Taguchi, PloS ONE, 2017) to miRNA miRNA expression and promoter methylation attributed to protein coding protein coding genes of ovarian cancers from TCGA Selected seven miRNAs miRNAs and 241 protein coding protein coding genes are distinct between seven normal tissues and 569 tumors. Most of protein coding protein coding genes methylation -miRNA miRNA expression pairs (94%) are significantly correlated. Normal screening using t test and correlation coefficients failed to achieve similar performance. These gene promoter methylation – miRNA expression pairs’ These gene promoter methylation – miRNA expression pairs’ biological meanings should be investigated further. biological meanings should be investigated further.
methods using TD for multi-omics data analysis. I have published a monograph from Springer. I am happy if you can but it, although it is extremely expensive.