Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor decomposition approach for Bioinformatic...

Tensor decomposition approach for Bioinformatic data analysis

Seminar at Center for Research and Development of Engineering Technology, NCHU, Taichung, Taiwan

http://www.engineer.nchu.edu.tw/~crdet/news_detail.php?cid=2&news_id=172&p=1

工科中心重點科技論壇 @ 國立中興大學 ( NCHU), 台中市,台湾
31th Oct. 2018

Y-h. Taguchi

October 31, 2018
Tweet

More Decks by Y-h. Taguchi

Other Decks in Technology

Transcript

  1. 1 Tensor decomposition approach for Bioinformatic data analysis Y-H. Taguchi

    Department of Physics, Chuo Univeristy, Tokyo, Japan
  2. 2 Chuo University Chuo University = ≠ 中央大学 ≠ 國立中

    國立中央大學 Chuo University in Japan is …. One of the oldest Japanese Private Universities. It was c.a. 125 years old. Famous for Law. World rank (THE) <1000. In Japan, it is ranked at “the bottom of the famous private universities” School of Engineering and Science is composed of 10 departments (Mathematics, Physics Physics, Applied Chemistry, Precise Mechanical Engineering, Electronics, Civil Engineering, Management Engineering, Information Technology, Biological Science, Human Science and Engineering) Eager to have Asian affiliated universities! Eager to have Asian affiliated universities!
  3. 3 The reason why I am here. The reason why

    I am here. There are two distinct conferences called as BIBE201(*) One at Shanghai, July 2018. I made friend with Prof. Congo Tak-Shing Ching there (Both of us were Keynote Speakers). Another at Splender Hotel Taichung, Oct 29th -31th 2018 (IEEE) Both of us are technical committee members. Then, Prof. Ching kindly invited me to present a talk at National Chung Hsing University, although he is not here unfortunately today. (*) Joint conference between bioinformatics (Y-h. Taguchi) and biomedical engineering (Prof. Ching).
  4. 4 Today’s topics: Today’s topics: What I am doing is

    data driven science data driven science. It might be opposite to purpose oriented research purpose oriented research. However, I hope that the audience here can understand what I am willing to do and get some hints for their own researches. Typical bioinformatics data analysis: Typical bioinformatics data analysis: large p (e.g., number of genes) small n (e.g., number of samples) Difficult to analyze. p: number of variables n: number of conditions (equations) → p>>n → no unique soultions and so on
  5. 6 Multiomics data ayalysis is Multiomics data ayalysis is often

    difficult because of several reasons, including 1. The number of features is often different from each other so much (e.g, number of miRNAs is 103, that of mRNA is104, that of methylation sites is >105). 2. We might require multiple criteria to screen features, e.g, “mRNAs should be distinct between patients and healthy controls”, “miRNA should be as well”, “mRNAs and miRNAs are expected to be correlated negatively”, and so on. This often results in that there are no or very few mRNAs and miRNAs that can pass all of requirements. 3. …..
  6. 7 In order to address these difficulties, I recently proposed

    “Tensor decomposition (TD) based unsupervised feature extraction (FE)” (Y-h. Taguchi, PLoS ONE, 2017). In this study, we applied TD based unsupervised FE to miRNAs miRNAs expression profiles and promoter methylation of protein coding protein coding genes of ovarian cancers taken from TCGA (The cancer gene atlas) and identified miRNAs miRNAs and protein coding protein coding genes such that 1. Promoter methylation of protein coding protein coding genes is distinct between tumors and normal tissues. 2. miRNA miRNA expression is distinct between tumors and normal tissues 3. miRNAs miRNAs expression and protein coding protein coding gene promoter methylation is correlated.
  7. 8 × x ij x il x ij ×x il

    x ijl G x ik1 x jk2 X lk3 Tensor decomposition x ijl =x ij ×x il ≒ΣΣ k1,k2,k3 G k1,k2,k3 x ik1 x jk2 x lk3 protein coding protein coding gene methylation miRNA expression i: patients vs healthy controls j: protein coding protein coding gene methylation l:miRNA expression TD applied to multi omics datasets TD applied to multi omics datasets
  8. 10 50 50 1000 50 100%noise No correlations No correlations

    + + 50 +20%noise 50×1000× 1000 tensor Tensor decomposition +20%noise
  9. 11 x ik1 k 1 =1 1≦i≦50 k 1 =2

    k 1 =3 x jk2 k 2 =1 k 2 =2 x lk3 k 3 =1 k 3 =2 1≦j≦1000 1≦l≦1000 samples gene(methylation) miRNA
  10. 12 variable selection variable selection Select x ik1 distinct between

    tumors and normal tissues Assume Gaussian for x jk1 Detect outliers P i =P[ >∑ k 1 ( x ik 1 σ ) 2 ] Benjamini-Hochberg corrected P <0.01 P-values by χ2 dist P(p) 1-p 0 1 Select x jk2 associated with x ik1
  11. 13 Datasets: Ovarian cancer from TCGA Datasets: Ovarian cancer from

    TCGA i: 8 normal vs 569 tumor samples = 577 samples j: 24906 protein coding protein coding genes to which promoter methylation is attributed l: 732 miRNAs miRNAs profiles Tesnor: x ijl ∈ ℝ577⨉24906⨉732 → too huge! → approximation (Y-h. Taguchi, PloS ONE, 2017) x jl = ∑ i x ijl ∈ ℝ24906⨉732 → computable x k1i methyl= ∑ l x il x lk3 x k1i miRNA= ∑ j x ij x jl2
  12. 14 Results Results x lk3 miRNA and x jk2 methyl

    for k2 =2 are distinct between 8 normal tissues and 569 tumors. → x lk3 miRNA and x jk2 methyl are also significantly correlated. COR=0.72 (P=10-9)
  13. 15 → 7 miRNAs miRNAs are selected using x l,k3=2

    and 241 protein coding protein coding genes are selected using x j,k2=2 . We found that seven miRNAs miRNAs and 241 protein coding protein coding genes are distinct between normal tissues and tumors. 1681 pairs = 7 miRNAs miRNAs ⨉ 241 protein coding protein coding genes are highly correlated (P<0.01 after BH correction). Most of pairs (94%) are correlated significantly.
  14. 16 Comparisons with conventional methods Comparisons with conventional methods Selections

    of miRNAs miRNAs and protein coding protein coding genes using t test (normal tissue vs tumors) P<0.01 after BH correction → 214 out of 732 miRNAs miRNAs and 19395 out of 24906 protein coding protein coding genes → too many miRNAs miRNAs and protein coding protein coding genes Correlation between top 214 miRNAs and 19395 protein coding protein coding genes Only 6% pairs are significantly correlated.
  15. 17 Correlation between top 7 miRNAs miRNAs and 241 protein

    coding protein coding genes by t test Poorer correlation than those selected by TD based unsupervised FE
  16. 18 Conversely, we might be able to first select pairs

    of miRNAs miRNAs and protein coding protein coding genes with significantly correlation (P<0.01 after BH correction) and select those distinct between normal tissues and tumors…. Only 10% pairs are significantly correlated. Thus, limited number of pairs are selected succesfully. But….. 608989 positively correlated pairs and 588783 negatively correalted pairs include unfortunately all of miRNAs miRNAs and protein coding protein coding genes genes… → useless for miRNAs miRNAs and protein coding protein coding genes selection…..
  17. 19 Although we have also evaluated biological significance of selected

    seven miRNAs miRNAs (using DIANA-mirpath) and 241 protein coding protein coding genes (using MSigDB), no time to report it. Basically, they are highly related to ovarian cancers.
  18. 20 Conclusions in this part Conclusions in this part In

    this paper, we have applied TD based unsupervised FE recently proposed by one of authors, Y-h. Taguchi (Y-h. Taguchi, PloS ONE, 2017) to miRNA miRNA expression and promoter methylation attributed to protein coding protein coding genes of ovarian cancers from TCGA Selected seven miRNAs miRNAs and 241 protein coding protein coding genes are distinct between seven normal tissues and 569 tumors. Most of protein coding protein coding genes methylation -miRNA miRNA expression pairs (94%) are significantly correlated. Normal screening using t test and correlation coefficients failed to achieve similar performance. These gene promoter methylation – miRNA expression pairs’ These gene promoter methylation – miRNA expression pairs’ biological meanings should be investigated further. biological meanings should be investigated further.