Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data

Slide 1

Slide 1 text

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data Y-h. Taguchi, Department of Physics, Chuo Univeristy, Tokyo, Japan Turki Turki, Department of Computer Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia the 8th Annual Congress of the European Society for Translational Medicine on novel therapeutics in solid tumors (EUSTM-2021) during 20-26 September, 2021.

Slide 2

Slide 2 text

This study has been published at the last December

Slide 3

Slide 3 text

I have published a book on this topics from Springer international. I am glad if the audience can buy it and learn my method. Y-h. Taguchi, Unsupervised Feature Extraction Applied to Bioinformatics --- A PCA and TD Based Approach --- Springer International (2020)

Slide 4

Slide 4 text

What is a tensor? Scholar x: a number Vector x i : a set of scholars in line Matrix x ij : a set of scholars aligned in a table (i.e. rows and columns) Tensor x ijk : a set of scholars aligned in an array more then two rows x ijk i j k 1 (1,2,3,4,...) (1 2 3 4 5 6 7 8 9 )

Slide 5

Slide 5 text

Tensor is suitable to store genomics data: Gene expression :x ijk ∈ ℝN⨉M⨉K N genes ⨉ M persons ⨉ K tissues x ijk i:genes j:persons k:tissues

Slide 6

Slide 6 text

What is tensor decomposition(TD)? Expand tensor as a series of product of vectors, x ijk i:genes j:persons k:tissues G k j i l 1 l 2 l 3 = u l 1 i u l 2 j u l 3 k u l 1 i u l 2 j u l 3 k x ijk ≃∑ l 1 =1 L 1 ∑ l 2 =2 L 2 ∑ l 3 =1 L 3 G (l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k

Slide 7

Slide 7 text

Advantages of tensor decomposition(TD): We can know “Dependence of x ijk upon i” → u l1i “Dependence of x ijk upon j” → u l2j “Dependence of x ijk upon k” → u l3k ← Healthy control vs patient ← tissue specificity Gene selection ↑ We can answer the question : Which genes are expressed between healthy controls and patients in tissue specific manner?

Slide 8

Slide 8 text

8 Interpretation….. j:samples Healthy control Patients ul2j For some specific l2 For some specific l3 k:tissues Tissue specific expression ul3k

Slide 9

Slide 9 text

9 i:genes ul1i tDEG: tissue specific Differentially Expressed Genes Healthy controls < Patients tDEG: tDEG: Healthy controls > Patients For some specific l1 with max |G(l1l2l3)| If G(l1l2l3)>0 Fixed

Slide 10

Slide 10 text

Epigenetic multiomics data set analyzed in this study on Prostate Cancer

Slide 11

Slide 11 text

The purpose: Integrating epigenetic multiomics data (DNA methylation, histone modification, ChIP-seq, ATAC-seq, etc…) is always problematic, because their causal relationship unclear. This prevents us from developing the suitable model to understand what the relationship between them. In this talk, I apply Tensor Decomposition (TD) based unsupervised Feature Extraction (FE) to epigetic multiomics data in fully unsupervised manner.

Slide 12

Slide 12 text

(Normal vs Tumor) ChIP-seq Histone modification Chromatin Accessibility

Slide 13

Slide 13 text

● ● ● 25000 Nucleotide acid ChIP-seq Histone modification Chromatin Accessibility ● ● ● 24 Chromosome ●●● chr2 ● ● ● ● ● ● chr1 chrY ● ● ● N (=123,817 ~ 3 ⨉ 109 / 25,000) regions 24

Slide 14

Slide 14 text

Epigenetic multiomics data set of prostate cancer is formatted as a tensor: x ijkm ∈ℝN×24×2×3 i: ith 25000 Nucleotide acid regions j: jth epigenetic data k: k=1: normal, k=2:tumor m: mth biological replicates

Slide 15

Slide 15 text

Applying TD to x ijkm x ijkm ≃∑ l 1 =1 L 1 ∑ l 2 =1 L 2 ∑ l 3 =1 L 3 ∑ l 4 =1 L 4 G (l 1 l 2 l 3 l 4 )u l 1 j u l 2 k u l 3 m u l 4 i G : weight of contribution of individual terms to x ijkm u l1j : the l 1 th unit vector represents j (epigenetics) dependence u l2k : the l 2 th unit vector represents k (normal vs tumor) dependence u l3m : the l 3 th unit vector represents m (biological replicates) dependence u l4i : the l 4 th unit vector represents i (25000 NA region) dependence

Slide 16

Slide 16 text

u 1j (l 1 =1) u 2j (l 1 =2) 1st and 2nd vectors attributed to j (epigenetics) 10 samples 10 samples Open DNA Active mark Inactive mark Prostate cancer activation

Slide 17

Slide 17 text

2nd vector attributed to normal vs tumor (l 2 =2) normal tumor 1st vector attributed to biological replicates (l 3 =1) k m Distinct between tumor and normal Common among biological replicates

Slide 18

Slide 18 text

Seek which l 4 is associated with l 1 =1, l 2 =2, l 3 =1 G(1,2,1,l 4 ) l 4 l 4 =8 epigenetics normal vs tumor biological replicates 25000 NA region

Slide 19

Slide 19 text

Region (i) selection l 4 =8 Attribute P-values to ith region with assuming u l4i obeys Gaussian (null hypothesis) using cumulative χ2 distribution. P i s are collected by Benjamini-Hochberg criterion. 1447 regions associated with adjusted P i less than 0.01 are selected. P(p i ) 1-p i 0 1

Slide 20

Slide 20 text

1447 genomic regions can discriminate between epigenetics as well as normal vs prostate with linear discriminant analysis (error ~ 37.5%)

Slide 21

Slide 21 text

Biological validations using Metascape

Slide 22

Slide 22 text

1785 protein-coding genes included in these 1447 genomic regions are uploaded to Metascape DsigNet category of Metascape Prostate Neoplasms

Slide 23

Slide 23 text

PaGenBase category of Metascape

Slide 24

Slide 24 text

TRRUST category of Metascape

Slide 25

Slide 25 text

Drug Discovery (This part is not included in paper)

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

1785 protein-coding genes included in these 1447 genomic regions are uploaded to Enrichr

Slide 28

Slide 28 text

Prostate adenocarcinoma IC50 Smaller is better

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

INK128 decreases the invasive potential of PC3 prostate cancer cells

Slide 31

Slide 31 text

Top ranked three compounds whose treatment upregulates 1785 genes included in 1447 genomic regions are promising drugs for the treatment of prostate cancer. It is worthwhile investigating lower ranked compounds, too.

Slide 32

Slide 32 text

Conclusion In this study, we have applied the recently proposed TD based unsupervised FE to integrated analysis of prostate cancer multiomics data sets. TD based unsupervised FE selected genomic regions whose value correctly discriminate not only kind of epigeneitc data but also normal tissues from tumors. 1785 genes are significantly related to prostate cancer. TD based unsupervised FE can identify promising compounds that can be used for prostate cancer treatment.

Slide 33

Slide 33 text

None of authors declare any conflict of interest. My contact information: E-mail: [email protected] URL: https://researchmap.jp/Yh_Taguchi/ Linkedin: https://www.linkedin.com/in/y-h-taguchi-164900b4/