Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO–

Y-h. Taguchi
August 04, 2019

Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO–

Multiomics data analysis is the central issue of genomics science. In spite of that, there are not well defined methods that can integrate multomics data sets, which are
ormatted as matrices with different sizes. In this paper, I propose the usage of tensor decomposition based unsupervised feature extraction as a data mining tool for multiomics data set. It can successfully integrate miRNA expression, mRNA expression and proteome, which were used as a demonstration example of DIABLO that is the recently proposed advanced method for the integrated analysis of multiomics data set.

Y-h. Taguchi

August 04, 2019
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. 1 Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature

    Extraction –Comparison with DIABLO– Y-h. Taguchi in De-Shuang Huang Vitoantonio Bevilacqua Prashan Premaratne (Eds.), Intelligent Computing Theories and Application, 15th International Conference, ICIC 2019 Nanchang, China, August 3–6, 2019 Proceedings, Part I, pp.565-574 https://doi.org/10.1007/978-3-030-26763-6_54
  2. 2 Introduction Introduction Now a day, the types of measurable

    omics data sets are continuously increasing continuously increasing, e.g., gene expression, promoter methylation, histone modification, non-coding genes (including microRNA) expression and genotype (SNP). However integrated analysis of these omics data set is not straightforward not straightforward even when there are paired (i.e,. measured in the same (common) samples), because how those having different number of variables as well as different amount of values should be weighted is unclear.
  3. 3 Traditional ways of integrating multiomics data set. Traditional ways

    of integrating multiomics data set. M samples ⨉ N1, N2, N3 variables (A) Contraction: M M N1 N2 N3 N1+ N2+N3 Y Simply aligning three matrices with sharing M rows and generate M ⨉ (N1+N2+N3) matrix
  4. 4 Pros and cons of contraction Pros: Pros: Easy to

    handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the generated generated matrix. Cons Cons: When the number of variables and/or the amount of values are invalanced, variables having more numbers and/or amount might govern the result. This means, it does not make sense to consider multiomics data sets.
  5. 5 (B) Ensemble: M N1 N2 M N3 M Y

    Analyze individual omics data independently and integrate outcomes, e.g, voting
  6. 6 Pros and cons of ensemble Pros: Pros: Easy to

    handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the individual individual matrices. Cons: Cons: How to integrate outcomes obtained by analyzing individual omics data is arbitrary. No guarantees that dealing with individual omics data as equally, although they have distinct number of variables and/or amount of values.
  7. 7 (C) DIABLO M N1 N2 M N3 M Y

    Generate linear combination of pairwise matrix product
  8. 8 Pros and cons of DIABLO Pros: Preventing from individual

    omics data’s governing the outcome by taking pairwise product. Cons: Cons: How to take product (all pairs? pairs are added with individual matrices them selves and so on) must be decided by human in advance.
  9. 9 ## $mRNA ## [1] 150 200 ## ## $miRNA

    ## [1] 150 184 ## ## $proteomics ## [1] 150 142 ## Basal Her2 LumA ## 45 30 75 Demonstration of DIABLO using the test data set in the package of R DIABLO
  10. 10 Number of components generated Errors 0.05 0.10 0.15 Discrimination

    performances using generated features
  11. 11 Discrimination performances using selected features

  12. 12 Our methods : Tensor decomposition based unsupervised feature extraction

    x ij :expression of ith mRNA of jth sample x kj :expression of kth miRNA of jth sample x lj :expression of lth protein of jth sample tensor:x iklj =x ij・x kj・x lj Apply tensor decompostion (tensor version of singular vallue decomposition)
  13. 13 HOSVD (Higher Order Singulaar Value Decomposition) x i1i2i3 =

    ∑ l1l2l3 G(l 1 l 2 l 3 ) u l1i1 u l2i2 u l3i3 1 ≤ l 1 ≤ 30,000, 1 ≤ l 2 ≤ 10, 1 ≤ l 3 ≤ 10. G(l 1 l 2 l 3 ): core tensor u l1i1 , u l2i2 , u l3i3 :singular value vectors (orthogonal matrices) x i1i2i3 G u i1l1 u i2l2 u i3l3
  14. 14 u 1j u 4j Basal Her2 LumA Basal 42

    42 4 0 Her2 2 25 25 2 LumA 1 1 73 73 predict Real Error 6.5% Linear discriminant analysis
  15. 15 Basal Her2 LumA mRNA miRNA protein Discrimination performances using

    selected features
  16. 16 Pros and cons of TD based unsupervised FE Pros:

    Pros: Fast (because of no optimization) Robust (independent of label information) Unsupervised (no need to construct model in advance) Cons: Cons: No ways if it does not work Need more memories: M ⨉ (N1+N2+N3) vs N1⨉N2⨉N3
  17. 17 I will publish a book of my method! Springer

    International 13th Sep, 2019 149.99 € 321+XVIII pages