Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO–

Slide 1

Slide 1 text

1 Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO– Y-h. Taguchi in De-Shuang Huang Vitoantonio Bevilacqua Prashan Premaratne (Eds.), Intelligent Computing Theories and Application, 15th International Conference, ICIC 2019 Nanchang, China, August 3–6, 2019 Proceedings, Part I, pp.565-574 https://doi.org/10.1007/978-3-030-26763-6_54

Slide 2

Slide 2 text

2 Introduction Introduction Now a day, the types of measurable omics data sets are continuously increasing continuously increasing, e.g., gene expression, promoter methylation, histone modification, non-coding genes (including microRNA) expression and genotype (SNP). However integrated analysis of these omics data set is not straightforward not straightforward even when there are paired (i.e,. measured in the same (common) samples), because how those having different number of variables as well as different amount of values should be weighted is unclear.

Slide 3

Slide 3 text

3 Traditional ways of integrating multiomics data set. Traditional ways of integrating multiomics data set. M samples ⨉ N1, N2, N3 variables (A) Contraction: M M N1 N2 N3 N1+ N2+N3 Y Simply aligning three matrices with sharing M rows and generate M ⨉ (N1+N2+N3) matrix

Slide 4

Slide 4 text

4 Pros and cons of contraction Pros: Pros: Easy to handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the generated generated matrix. Cons Cons: When the number of variables and/or the amount of values are invalanced, variables having more numbers and/or amount might govern the result. This means, it does not make sense to consider multiomics data sets.

Slide 5

Slide 5 text

5 (B) Ensemble: M N1 N2 M N3 M Y Analyze individual omics data independently and integrate outcomes, e.g, voting

Slide 6

Slide 6 text

6 Pros and cons of ensemble Pros: Pros: Easy to handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the individual individual matrices. Cons: Cons: How to integrate outcomes obtained by analyzing individual omics data is arbitrary. No guarantees that dealing with individual omics data as equally, although they have distinct number of variables and/or amount of values.

Slide 7

Slide 7 text

7 (C) DIABLO M N1 N2 M N3 M Y Generate linear combination of pairwise matrix product

Slide 8

Slide 8 text

8 Pros and cons of DIABLO Pros: Preventing from individual omics data’s governing the outcome by taking pairwise product. Cons: Cons: How to take product (all pairs? pairs are added with individual matrices them selves and so on) must be decided by human in advance.

Slide 9

Slide 9 text

9 ## $mRNA ## [1] 150 200 ## ## $miRNA ## [1] 150 184 ## ## $proteomics ## [1] 150 142 ## Basal Her2 LumA ## 45 30 75 Demonstration of DIABLO using the test data set in the package of R DIABLO

Slide 10

Slide 10 text

10 Number of components generated Errors 0.05 0.10 0.15 Discrimination performances using generated features

Slide 11

Slide 11 text

11 Discrimination performances using selected features

Slide 12

Slide 12 text

12 Our methods : Tensor decomposition based unsupervised feature extraction x ij :expression of ith mRNA of jth sample x kj :expression of kth miRNA of jth sample x lj :expression of lth protein of jth sample tensor：x iklj =x ij・x kj・x lj Apply tensor decompostion (tensor version of singular vallue decomposition)

Slide 13

Slide 13 text

13 HOSVD (Higher Order Singulaar Value Decomposition) x i1i2i3 = ∑ l1l2l3 G(l 1 l 2 l 3 ) u l1i1 u l2i2 u l3i3 1 ≤ l 1 ≤ 30,000, 1 ≤ l 2 ≤ 10, 1 ≤ l 3 ≤ 10. G(l 1 l 2 l 3 ): core tensor u l1i1 , u l2i2 , u l3i3 :singular value vectors (orthogonal matrices) x i1i2i3 G u i1l1 u i2l2 u i3l3

Slide 14

Slide 14 text

14 u 1j u 4j Basal Her2 LumA Basal 42 42 4 0 Her2 2 25 25 2 LumA 1 1 73 73 predict Real Error ６．５% Linear discriminant analysis

Slide 15

Slide 15 text

15 Basal Her2 LumA mRNA miRNA protein Discrimination performances using selected features

Slide 16

Slide 16 text

16 Pros and cons of TD based unsupervised FE Pros: Pros: Fast (because of no optimization) Robust (independent of label information) Unsupervised (no need to construct model in advance) Cons: Cons: No ways if it does not work Need more memories: M ⨉ (N1+N2+N3) vs N1⨉N2⨉N3

Slide 17

Slide 17 text

17 I will publish a book of my method! Springer International 13th Sep, 2019 149.99 € 321＋XVIII pages