Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO–

1 Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature
Extraction –Comparison with DIABLO– Y-h. Taguchi in De-Shuang Huang Vitoantonio Bevilacqua Prashan Premaratne (Eds.), Intelligent Computing Theories and Application, 15th International Conference, ICIC 2019 Nanchang, China, August 3–6, 2019 Proceedings, Part I, pp.565-574 https://doi.org/10.1007/978-3-030-26763-6_54

2 Introduction Introduction Now a day, the types of measurable
omics data sets are continuously increasing continuously increasing, e.g., gene expression, promoter methylation, histone modification, non-coding genes (including microRNA) expression and genotype (SNP). However integrated analysis of these omics data set is not straightforward not straightforward even when there are paired (i.e,. measured in the same (common) samples), because how those having different number of variables as well as different amount of values should be weighted is unclear.

3 Traditional ways of integrating multiomics data set. Traditional ways
of integrating multiomics data set. M samples ⨉ N1, N2, N3 variables (A) Contraction: M M N1 N2 N3 N1+ N2+N3 Y Simply aligning three matrices with sharing M rows and generate M ⨉ (N1+N2+N3) matrix

4 Pros and cons of contraction Pros: Pros: Easy to
handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the generated generated matrix. Cons Cons: When the number of variables and/or the amount of values are invalanced, variables having more numbers and/or amount might govern the result. This means, it does not make sense to consider multiomics data sets.

5 (B) Ensemble: M N1 N2 M N3 M Y
Analyze individual omics data independently and integrate outcomes, e.g, voting

6 Pros and cons of ensemble Pros: Pros: Easy to
handle. No needs to invent new methods specific to multiomics data, but all standard methods applicable to single omics data sets can be applied to the individual individual matrices. Cons: Cons: How to integrate outcomes obtained by analyzing individual omics data is arbitrary. No guarantees that dealing with individual omics data as equally, although they have distinct number of variables and/or amount of values.

7 (C) DIABLO M N1 N2 M N3 M Y
Generate linear combination of pairwise matrix product

8 Pros and cons of DIABLO Pros: Preventing from individual
omics data’s governing the outcome by taking pairwise product. Cons: Cons: How to take product (all pairs? pairs are added with individual matrices them selves and so on) must be decided by human in advance.

9 ## $mRNA ## [1] 150 200 ## ## $miRNA
## [1] 150 184 ## ## $proteomics ## [1] 150 142 ## Basal Her2 LumA ## 45 30 75 Demonstration of DIABLO using the test data set in the package of R DIABLO

10 Number of components generated Errors 0.05 0.10 0.15 Discrimination
performances using generated features

11 Discrimination performances using selected features

12 Our methods : Tensor decomposition based unsupervised feature extraction
x ij :expression of ith mRNA of jth sample x kj :expression of kth miRNA of jth sample x lj :expression of lth protein of jth sample tensor：x iklj =x ij・x kj・x lj Apply tensor decompostion (tensor version of singular vallue decomposition)

13 HOSVD (Higher Order Singulaar Value Decomposition) x i1i2i3 =
∑ l1l2l3 G(l 1 l 2 l 3 ) u l1i1 u l2i2 u l3i3 1 ≤ l 1 ≤ 30,000, 1 ≤ l 2 ≤ 10, 1 ≤ l 3 ≤ 10. G(l 1 l 2 l 3 ): core tensor u l1i1 , u l2i2 , u l3i3 :singular value vectors (orthogonal matrices) x i1i2i3 G u i1l1 u i2l2 u i3l3

14 u 1j u 4j Basal Her2 LumA Basal 42
42 4 0 Her2 2 25 25 2 LumA 1 1 73 73 predict Real Error ６．５% Linear discriminant analysis

15 Basal Her2 LumA mRNA miRNA protein Discrimination performances using
selected features

16 Pros and cons of TD based unsupervised FE Pros:
Pros: Fast (because of no optimization) Robust (independent of label information) Unsupervised (no need to construct model in advance) Cons: Cons: No ways if it does not work Need more memories: M ⨉ (N1+N2+N3) vs N1⨉N2⨉N3

17 I will publish a book of my method! Springer
International 13th Sep, 2019 149.99 € 321＋XVIII pages

Multiomics Data Analysis Using Tensor Decomposi...

Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature Extraction –Comparison with DIABLO–

Y-h. Taguchi PRO

More Decks by Y-h. Taguchi

Other Decks in Science

Featured

Transcript

1 Multiomics Data Analysis Using Tensor Decomposition Based Unsupervised Feature

2 Introduction Introduction Now a day, the types of measurable

3 Traditional ways of integrating multiomics data set. Traditional ways

4 Pros and cons of contraction Pros: Pros: Easy to

5 (B) Ensemble: M N1 N2 M N3 M Y

6 Pros and cons of ensemble Pros: Pros: Easy to

7 (C) DIABLO M N1 N2 M N3 M Y

8 Pros and cons of DIABLO Pros: Preventing from individual

9 ## $mRNA ## [1] 150 200 ## ## $miRNA

10 Number of components generated Errors 0.05 0.10 0.15 Discrimination

11 Discrimination performances using selected features

12 Our methods : Tensor decomposition based unsupervised feature extraction

13 HOSVD (Higher Order Singulaar Value Decomposition) x i1i2i3 =

14 u 1j u 4j Basal Her2 LumA Basal 42

15 Basal Her2 LumA mRNA miRNA protein Discrimination performances using

16 Pros and cons of TD based unsupervised FE Pros:

17 I will publish a book of my method! Springer