Slide 1

Slide 1 text

BioCAsiaHKBIOINFO2023 1 TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction Prof Y-h. Taguchi, Department of Physics, Chuo University, Tokyo, Japan

Slide 2

Slide 2 text

BioCAsiaHKBIOINFO2023 2 We have published a book on this method at 2019. 2nd Ed will be in Fall 2024.

Slide 3

Slide 3 text

BioCAsiaHKBIOINFO2023 3 TDbasedUFE TDbasedUFEadv

Slide 4

Slide 4 text

BioCAsiaHKBIOINFO2023 4 TDbasedUFE = Tensor Decomposition based Unsupervised Feature Extraction TDbasedUFEadv = Tensor Decomposition based Unsupervised Feature Extraction advanced What is tensor? Tensor is an extension of matrix to have more index than rows and columns. What is tensor decomposition (TD)? Decomposition of tensor to the product some of vector / matrix / (smaller) tensor

Slide 5

Slide 5 text

BioCAsiaHKBIOINFO2023 5 Fujita et al, PLOS ONE (2023)

Slide 6

Slide 6 text

BioCAsiaHKBIOINFO2023 6 Taguchi Y-h., Turki Turki, Application note: TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction, Frontiers in Artificial Intelligence (2023) Assuming Gaussian Dist. for ul1i DEG identification

Slide 7

Slide 7 text

BioCAsiaHKBIOINFO2023 7 (3) (0) (0)

Slide 8

Slide 8 text

BioCAsiaHKBIOINFO2023 8 PART I DEG identification PART I DEG identification Date set: ACC.rnaseq in RTCGA.rneseq (BioC 3.17) ith gene of jth replicates of kth stage (“stage i” to “stage iv”) Genes Replicates Stages

Slide 9

Slide 9 text

BioCAsiaHKBIOINFO2023 9 Replicates j

Slide 10

Slide 10 text

BioCAsiaHKBIOINFO2023 10 k Stages

Slide 11

Slide 11 text

BioCAsiaHKBIOINFO2023 11 1,692 genes were selected with the threshold-adjusted P-value of 0.01 cf. 136 genes with DESeq2 that assumes four classes, each of which includes nine replicates. ● TDbasedUFE identified more genes than DESeq2. ● These two genes sets are distinct (If we select top ranked 1,682 genes by DESeq2, overlap is as small as 279!) ● Enrichment analysis (the number of terms with adj. P<0.05) GOBP GOMF GOCC KEGG 1,692 genes (TDbasedUFE) 129 151 143 923 136 genes (DESeq2) 0 0 3 12

Slide 12

Slide 12 text

BioCAsiaHKBIOINFO2023 12 PART II Multiomics PART II Multiomics Data set: miRNA expression, mRNA expression, DNA methylation of ACC in curatedTCGA (BioC 3.17) miRNA mRNA methylation

Slide 13

Slide 13 text

BioCAsiaHKBIOINFO2023 13 miRNA mRNA methylation

Slide 14

Slide 14 text

BioCAsiaHKBIOINFO2023 14 23 out of 1,046 miRNAs, 1,016 out of 20,501 mRNAs, 7,295 out of 485,577 methy. sites cf. DIABLO DIABLO failed converge Amrit Singh et al, DIABLO DIABLO: an integrative approach for identifying key molecular drivers from multi- omics assays, Bioinformatics, Volume 35, Issue 17, September 2019, Pages 3055–3062, cited 412 412 times (mixOmics in BioC3.17)

Slide 15

Slide 15 text

BioCAsiaHKBIOINFO2023 15 Enrichment Analysis: 23 out of 1,046 miRNAs : DIANA-mirpath 1,016 out of 20,501 mRNAs: Enrichr 7,295 out of 485,577 methylation sites: Enrichr → Many cancer related pathways were identified. TDbasedUFEadv can deal with more complicated setups TDbasedUFEadv can deal with more complicated setups (No time to explain!) (No time to explain!)

Slide 16

Slide 16 text

BioCAsiaHKBIOINFO2023 16 Conclusions We have implemented two bioconductor packages, TDbasedUFE TDbasedUFE and TDbasedUFEadv TDbasedUFEadv, which allow users to perform “TD based unsupervised FE” without detailed knowledge about TD. TDbasedUFE outperformed two de fact standard packages, DESeq2 and DIABLO, when applied to TCGA data sets.