Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction

Y-h. Taguchi
October 17, 2023

TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction

Presentation at Joint Bioconductor Asia 2023 symposium and Hong Kong bioinformatic symposium.
16th-17th Oct 2023
University of Hong Kong
https://biocasia2023.bioconductor.org/

Y-h. Taguchi

October 17, 2023
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. BioCAsiaHKBIOINFO2023 1
    TDbasedUFE and TDbasedUFEadv: bioconductor
    packages to perform tensor decomposition
    based unsupervised feature extraction
    Prof Y-h. Taguchi,
    Department of Physics, Chuo University, Tokyo, Japan

    View full-size slide

  2. BioCAsiaHKBIOINFO2023 2
    We have published a book
    on this method at 2019.
    2nd Ed will be in Fall 2024.

    View full-size slide

  3. BioCAsiaHKBIOINFO2023 3
    TDbasedUFE TDbasedUFEadv

    View full-size slide

  4. BioCAsiaHKBIOINFO2023 4
    TDbasedUFE = Tensor Decomposition based Unsupervised Feature
    Extraction
    TDbasedUFEadv = Tensor Decomposition based Unsupervised
    Feature Extraction advanced
    What is tensor?
    Tensor is an extension of matrix to have more index than rows and
    columns.
    What is tensor decomposition (TD)?
    Decomposition of tensor to the product some of vector / matrix /
    (smaller) tensor

    View full-size slide

  5. BioCAsiaHKBIOINFO2023 5
    Fujita et al, PLOS ONE (2023)

    View full-size slide

  6. BioCAsiaHKBIOINFO2023 6
    Taguchi Y-h., Turki Turki, Application note: TDbasedUFE and TDbasedUFEadv:
    bioconductor packages to perform tensor decomposition based unsupervised feature
    extraction, Frontiers in Artificial Intelligence (2023)
    Assuming Gaussian Dist. for ul1i
    DEG identification

    View full-size slide

  7. BioCAsiaHKBIOINFO2023 7
    (3)
    (0)
    (0)

    View full-size slide

  8. BioCAsiaHKBIOINFO2023 8
    PART I DEG identification
    PART I DEG identification
    Date set: ACC.rnaseq in RTCGA.rneseq (BioC 3.17)
    ith gene of jth replicates of kth stage (“stage i” to “stage iv”)
    Genes
    Replicates
    Stages

    View full-size slide

  9. BioCAsiaHKBIOINFO2023 9
    Replicates
    j

    View full-size slide

  10. BioCAsiaHKBIOINFO2023 10
    k
    Stages

    View full-size slide

  11. BioCAsiaHKBIOINFO2023 11
    1,692 genes were selected with the threshold-adjusted P-value of 0.01
    cf. 136 genes with DESeq2 that assumes four classes, each of which
    includes nine replicates.
    ● TDbasedUFE identified more genes than DESeq2.
    ● These two genes sets are distinct (If we select top ranked 1,682
    genes by DESeq2, overlap is as small as 279!)
    ● Enrichment analysis (the number of terms with adj. P<0.05)
    GOBP GOMF GOCC KEGG
    1,692 genes (TDbasedUFE) 129 151 143 923
    136 genes (DESeq2) 0 0 3 12

    View full-size slide

  12. BioCAsiaHKBIOINFO2023 12
    PART II Multiomics
    PART II Multiomics
    Data set: miRNA expression, mRNA expression, DNA methylation
    of ACC in curatedTCGA (BioC 3.17)
    miRNA
    mRNA
    methylation

    View full-size slide

  13. BioCAsiaHKBIOINFO2023 13
    miRNA mRNA methylation

    View full-size slide

  14. BioCAsiaHKBIOINFO2023 14
    23 out of 1,046 miRNAs,
    1,016 out of 20,501 mRNAs,
    7,295 out of 485,577 methy. sites
    cf. DIABLO
    DIABLO failed converge
    Amrit Singh et al, DIABLO
    DIABLO: an
    integrative approach for identifying
    key molecular drivers from multi-
    omics assays, Bioinformatics,
    Volume 35, Issue 17, September
    2019, Pages 3055–3062, cited 412
    412
    times
    (mixOmics in BioC3.17)

    View full-size slide

  15. BioCAsiaHKBIOINFO2023 15
    Enrichment Analysis:
    23 out of 1,046 miRNAs : DIANA-mirpath
    1,016 out of 20,501 mRNAs: Enrichr
    7,295 out of 485,577 methylation sites: Enrichr
    → Many cancer related pathways were identified.
    TDbasedUFEadv can deal with more complicated setups
    TDbasedUFEadv can deal with more complicated setups
    (No time to explain!)
    (No time to explain!)

    View full-size slide

  16. BioCAsiaHKBIOINFO2023 16
    Conclusions
    We have implemented two bioconductor packages, TDbasedUFE
    TDbasedUFE
    and TDbasedUFEadv
    TDbasedUFEadv, which allow users to perform “TD based
    unsupervised FE” without detailed knowledge about TD.
    TDbasedUFE outperformed two de fact standard packages, DESeq2
    and DIABLO, when applied to TCGA data sets.

    View full-size slide