Tensor Decomposition-based Unsupervised Feature Extraction for Integrated Analysis of TCGA Data on MicroRNA Expression and Promoter Methylation of Genes in Ovarian Cancer

948966d9c690e72faba4fd76e1858c56?s=47 Y-h. Taguchi
December 02, 2019

Tensor Decomposition-based Unsupervised Feature Extraction for Integrated Analysis of TCGA Data on MicroRNA Expression and Promoter Methylation of Genes in Ovarian Cancer

Presentation at SIGBIO60
http://www.ipsj.or.jp/kenkyukai/event/bio60.html
2th Dec. 2019, at Fujuoka, Japan

948966d9c690e72faba4fd76e1858c56?s=128

Y-h. Taguchi

December 02, 2019
Tweet

Transcript

  1. 1 Tensor Decomposition-based Unsupervised Feature Extraction for Integrated Analysis of

    TCGA Data on MicroRNA Expression and Promoter Methylation of Genes in Ovarian Cancer Y-H. Taguchi Department of Physics, Chuo Univeristy, Tokyo, Japan and Ka-Lok Ng Department of Bioinformatics and Medical Engineering Asia University, Taichung, Taiwan Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan 10.1109/BIBE.2018.00045 https://doi.org/10.1101/380071
  2. 2 Introduction Introduction Multiomics data ayalysis is often difficult because

    of several reasons, including 1. The number of features is often different from each other so much (e.g, number of miRNAs is 103, that of mRNA is104, that of methylation sites is >105). 2. We might require multiple criteria to screen features, e.g, “mRNAs should be distinct between patients and healthy controls”, “miRNA should be as well”, “mRNAs and miRNAs are expected to be correlated negatively”, and so on. This often results in that there are no or very few mRNAs and miRNAs that can pass all of requirements. 3. …..
  3. 3 In order to address these difficulties, one of authors,

    Y-h. Taguchi, recently proposed “Tensor decomposition (TD) based unsupervised feature extraction (FE)” (Y-h. Taguchi, PloS ONE, 2017). In this study, we applied TD based unsupervised FE to miRNAs miRNAs expression profiles and promoter methylation of protein coding protein coding genes of ovarian cancers taken from TCGA (The cancer gene atlas) and identified miRNAs miRNAs and protein coding protein coding genes such that 1. Promoter methylation of protein coding protein coding genes is distinct between tumors and normal tissues. 2. miRNA miRNA expression is distinct between tumors and normal tissues 3. miRNAs miRNAs expression and protein coding protein coding gene promoter methylation is correlated.
  4. 4 × x ij x kj x ij ×x kj

    x ijk G x l1i x l2j x l3k Tensor decomposition x ijk =x ij ×x kj ≒ΣΣ l1,l2,l3 G l1,l2,l3 x l1i x l2j x l3k protein coding protein coding gene methylation miRNA expression i: protein coding protein coding gene methylation j: patients vs healthy controls k:miRNA expression TD applied to multi omics datasets TD applied to multi omics datasets
  5. 5 gene selection gene selection Select x l2j distinct between

    tumors and normal tissues Assume Gaussian for x l1i Detect outliers P i =P[>∑ l1 ( x il 1 σ ) 2 ] Benjamini-Hochberg corrected P <0.01 P-values by χ2 dist P(p) 1-p 0 1 Select x l1i associated with x l2j
  6. 6 Datasets: Ovarian cancer from TCGA Datasets: Ovarian cancer from

    TCGA i: 24906 protein coding protein coding genes to which promoter methylation is attributed j: 8 normal vs 569 tumor samples = 577 samples k: 732 miRNAs miRNAs profiles Tesnor: x ijk ∈ ℝ24906⨉577⨉732 → too huge! → approximation (Y-h. Taguchi, PloS ONE, 2017) x ik = ∑ j x ijk ∈ ℝ24906⨉732 → computable x l2j miRNA= ∑ k x l3k x kj x l2j methyl= ∑ i x l1i x ij
  7. 7 Results Results x l2j miRNA and x l2j methyl

    for l2 =2 are distinct between 8 normal tissues and 569 tumors. → x l2j miRNA and x l2j methyl are also significantly correlated. COR=0.72 (P=10-9)
  8. 8 → 7 miRNAs miRNAs are selected using x l3=2,k

    and 241 protein coding protein coding genes are selected using x l1=2,i .
  9. 9 We found that seven miRNAs miRNAs and 241 protein

    coding protein coding genes are distinct between normal tissues and tumors. 1681 pairs = 7 miRNAs miRNAs ⨉ 241 protein coding protein coding genes are highly correlated (P<0.01 after BH correction). Most of pairs (94%) are correlated significantly.
  10. 10 Comparisons with conventional methods Comparisons with conventional methods Selections

    of miRNAs miRNAs and protein coding protein coding genes using t test (normal tissue vs tumors) P<0.01 after BH correction → 214 out of 732 miRNAs miRNAs and 19395 out of 24906 protein coding protein coding genes → too many miRNAs miRNAs and protein coding protein coding genes Correlation between top 214 miRNAs and 19395 protein coding protein coding genes Only 6% pairs are significantly correlated.
  11. 11 Correlation between top 7 miRNAs miRNAs and 241 protein

    coding protein coding genes by t test Poorer correlation than those selected by TD based unsupervised FE
  12. 12 Conversely, we might be able to first select pairs

    of miRNAs miRNAs and protein coding protein coding genes with significantly correlation (P<0.01 after BH correction) and select those distinct between normal tissues and tumors…. Only 10% pairs are significantly correlated. Thus, limited number of pairs are selected succesfully. But….. 608989 positively correlated pairs and 588783 negatively correalted pairs include unfortunately all of miRNAs miRNAs and protein coding protein coding genes genes… → useless for miRNAs miRNAs and protein coding protein coding genes selection…..
  13. 13 Biological evaluation of selected miRNA miRNAs and protein protein

    coding genes coding genes is important, since they might be selected via abiological reasons because of unsupervised nature of this method.
  14. 14 Biological evaluation of selected 7 miRNAs miRNAs using DIANA-

    mirpath
  15. 15 Biological Evaluation of selected 241 protein coding genes protein

    coding genes (MsigDB C6: oncogenic signatures “epithelium”)
  16. 16 Biological Evaluation of selected 241 protein coding genes protein

    coding genes (MsigDB GO Molecular Function) Two hormones, estrogen and progesterone, are involved in ovarian cancer formation
  17. 17 Conclusions Conclusions In this paper, we have applied TD

    based unsupervised FE recently proposed by one of authors, Y-h. Taguchi (Y-h. Taguchi, PloS ONE, 2017) to miRNA miRNA expression and promoter methylation attributed to protein coding protein coding genes of ovarian cancers from TCGA Selected seven miRNAs miRNAs and 241 protein coding protein coding genes are distinct between seven normal tissues and 569 tumors. Most of protein coding protein coding genes methylation -miRNA miRNA expression pairs (94%) are significantly correlated. Normal screening using t test and correlation coefficients failed to achieve similar performance. These gene promoter methylation – miRNA expression pairs’ These gene promoter methylation – miRNA expression pairs’ biological meanings should be investigated further. biological meanings should be investigated further.
  18. 18 Summary We can select biologically reasonable genes with unsupervised

    methods using TD for multi-omics data analysis. I have published a monograph from Springer. I am happy if you can but it, although it is extremely expensive.