Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction

Y-h. Taguchi
October 17, 2023

TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction

Presentation at Joint Bioconductor Asia 2023 symposium and Hong Kong bioinformatic symposium.
16th-17th Oct 2023
University of Hong Kong
https://biocasia2023.bioconductor.org/

Y-h. Taguchi

October 17, 2023
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. BioCAsiaHKBIOINFO2023 1 TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor

    decomposition based unsupervised feature extraction Prof Y-h. Taguchi, Department of Physics, Chuo University, Tokyo, Japan
  2. BioCAsiaHKBIOINFO2023 4 TDbasedUFE = Tensor Decomposition based Unsupervised Feature Extraction

    TDbasedUFEadv = Tensor Decomposition based Unsupervised Feature Extraction advanced What is tensor? Tensor is an extension of matrix to have more index than rows and columns. What is tensor decomposition (TD)? Decomposition of tensor to the product some of vector / matrix / (smaller) tensor
  3. BioCAsiaHKBIOINFO2023 6 Taguchi Y-h., Turki Turki, Application note: TDbasedUFE and

    TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction, Frontiers in Artificial Intelligence (2023) Assuming Gaussian Dist. for ul1i DEG identification
  4. BioCAsiaHKBIOINFO2023 8 PART I DEG identification PART I DEG identification

    Date set: ACC.rnaseq in RTCGA.rneseq (BioC 3.17) ith gene of jth replicates of kth stage (“stage i” to “stage iv”) Genes Replicates Stages
  5. BioCAsiaHKBIOINFO2023 11 1,692 genes were selected with the threshold-adjusted P-value

    of 0.01 cf. 136 genes with DESeq2 that assumes four classes, each of which includes nine replicates. • TDbasedUFE identified more genes than DESeq2. • These two genes sets are distinct (If we select top ranked 1,682 genes by DESeq2, overlap is as small as 279!) • Enrichment analysis (the number of terms with adj. P<0.05) GOBP GOMF GOCC KEGG 1,692 genes (TDbasedUFE) 129 151 143 923 136 genes (DESeq2) 0 0 3 12
  6. BioCAsiaHKBIOINFO2023 12 PART II Multiomics PART II Multiomics Data set:

    miRNA expression, mRNA expression, DNA methylation of ACC in curatedTCGA (BioC 3.17) miRNA mRNA methylation
  7. BioCAsiaHKBIOINFO2023 14 23 out of 1,046 miRNAs, 1,016 out of

    20,501 mRNAs, 7,295 out of 485,577 methy. sites cf. DIABLO DIABLO failed converge Amrit Singh et al, DIABLO DIABLO: an integrative approach for identifying key molecular drivers from multi- omics assays, Bioinformatics, Volume 35, Issue 17, September 2019, Pages 3055–3062, cited 412 412 times (mixOmics in BioC3.17)
  8. BioCAsiaHKBIOINFO2023 15 Enrichment Analysis: 23 out of 1,046 miRNAs :

    DIANA-mirpath 1,016 out of 20,501 mRNAs: Enrichr 7,295 out of 485,577 methylation sites: Enrichr → Many cancer related pathways were identified. TDbasedUFEadv can deal with more complicated setups TDbasedUFEadv can deal with more complicated setups (No time to explain!) (No time to explain!)
  9. BioCAsiaHKBIOINFO2023 16 Conclusions We have implemented two bioconductor packages, TDbasedUFE

    TDbasedUFE and TDbasedUFEadv TDbasedUFEadv, which allow users to perform “TD based unsupervised FE” without detailed knowledge about TD. TDbasedUFE outperformed two de fact standard packages, DESeq2 and DIABLO, when applied to TCGA data sets.