Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor Decomposition Based Integrated Analysis of Protein-Protein Interaction with Cancer Gene Expression Can Improve the Coincidence with Clinical Labels

Tensor Decomposition Based Integrated Analysis of Protein-Protein Interaction with Cancer Gene Expression Can Improve the Coincidence with Clinical Labels

Presentation at ICBCB2023
http://www.icbcb.org/

Y-h. Taguchi

April 24, 2023
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. Tensor Decomposition Based Integrated Analysis of Protein-Protein
    Interaction with Cancer Gene Expression Can Improve the
    Coincidence with Clinical Labels
    Y-h. Taguchi, Department of Physics, Chuo University
    Turki Turki, Department of Computer Science, King Abdulaziz
    University
    For material request
    Wechat →

    View full-size slide

  2. https://doi.org/10.1101/2023.02.26.530076
    https://doi.org/10.1101/2023.02.26.530076

    View full-size slide

  3. Motivation:
    Integrated analysis of protein-protein interaction (PPI) and gene
    expression
    gene expression profiles: N genes ✕ M samples
    PPI: N genes (proteins) ✕ N genes (proteins)
    How can we integrate two matrices with distinct dimensions?

    View full-size slide

  4. Using tensor decomposition (TD)
    =
    Tucker decomposition
    x
    ijk
    ∈ℝN ×M×K
    =∑
    l
    1
    =1
    N

    l
    2
    =1
    M

    l
    3
    =1
    K
    G(l
    1
    l
    2
    l
    3
    )u
    l
    1
    i
    u
    l
    2
    j
    u
    l
    3
    k
    How can we make use of TD to
    integrate PPI and gene expression?

    View full-size slide

  5. Apply singular value decomposition (SVD) to gene expression
    profile x
    ij
    ∈ℝN✕M and PPI n
    ii’
    ∈ℝN✕N
    x
    ij
    =∑
    l=1
    L
    λl
    ' u '
    li
    v
    lj
    n
    ii'
    =∑
    l=1
    L
    λl
    u
    li
    u
    li'
    Bundle u
    li
    and u’
    li
    to generate tensor x
    ilk
    ∈ℝN✕L✕2
    x
    il1
    =u
    li
    , x
    il 2
    =u '
    li
    Apply TD to x
    ilk
    x
    ilk
    ∈ℝN ×L×2
    =∑
    l
    1
    =1
    N

    l
    2
    =1
    L

    l
    3
    =1
    2
    G(l
    1
    l
    2
    l
    3
    )~
    u
    l
    1
    i
    ~
    u
    l
    2
    l
    ~
    u
    l
    3
    k
    (gene expression)
    (PPI)

    View full-size slide

  6. N genes
    M samples
    Gene
    expression
    N genes
    N genes
    SVD
    N genes N genes
    L SVV L SVV
    N ⨉ L ⨉ 2
    2 (gene expression or network)⨉ 2 SVV
    N genes

    N SVV
    L SVV ⨉ L SVV
    N genes
    L SVV

    M samples
    L SVV
    M samples
    Class
    labeling
    Comparisons
    HOSVD
    M samples
    L SVV
    PPI

    View full-size slide

  7. Recover singular value vector (SVV) attributed to sample j as
    ~
    v
    lj
    =∑
    i=1
    N
    x
    ij
    ~
    u
    li
    Compare coincidence between v
    lj
    or v~
    lj
    with class label (categorical
    regression)
    v
    lj
    =a
    l
    +∑
    s=1
    S
    b
    ls
    δjs
    ~
    v
    lj
    =a '
    l
    +∑
    s=1
    S
    b'
    ls
    δjs
    δ
    js
    = 1 only when sample j belongs to class s otherwise 0
    (SVD)
    (TD)

    View full-size slide

  8. N genes
    M samples
    Gene
    expression
    N genes
    N genes
    SVD
    N genes N genes
    L SVV L SVV
    N ⨉ L ⨉ 2
    2 (gene expression or network)⨉ 2 SVV
    N genes

    N SVV
    L SVV ⨉ L SVV
    N genes
    L SVV

    M samples
    L SVV
    M samples
    Class
    labeling
    Comparisons
    HOSVD
    M samples
    L SVV
    PPI

    View full-size slide

  9. (1) “PATIENT. VITAL STATUS”,
    (2) “PATIENT . STAGE EVENT.PATHOLOGIC STAGE ”,
    (3)“PATIENT.STAGE EVENT.TNM CATEGORIES . PATHOLOGIC CATEGORIES.
    PATHOLOGIC M”
    (4) “PATIENT. STAGE EVENT.TNM CATEGORIES.PATHOLOGIC
    CATEGORIES .PATHOLOGIC T ”
    (5) “ PATIENT.STAGE EVENT. TNM CATEGORIES. PATHOLOGIC
    CATEGORIES .PATHOLOGIC N ”
    (6) AUC for “ PATIENT. VITAL STATUS”.
    27 cancers with 5 kinds of classes

    View full-size slide

  10. Scatter plot of P
    P values
    values
    (ascending order) obtained by
    obtained by
    categorical regression
    categorical regression for SVD
    and TD for 27 cancers in class (1)
    Red triangles: significant
    Blue crosses: not significant
    TD (vertical axis) is more
    coincident with classes than SVD
    (horizontal axis)
    Class (1)

    View full-size slide

  11. Class (2) Class (3)

    View full-size slide

  12. Class (4) Class (5)

    View full-size slide

  13. Compare P values obtained by
    categorical regression for SVD and
    TD with either t test or Wilcoxon
    test for class (1) to (5).
    TD is more coincident with classes
    than SVD other than class (3)

    View full-size slide

  14. AUC for the
    discrimination task for
    class (1) with 11 cancers
    with significant P-values
    for categorical regression.
    TD (vertical axis) is
    almost always better than
    SVD.

    View full-size slide

  15. Conclusions
    Integrated analysis of gene expression and PPI can improve the
    coincidence between SVV with class labels although PPI does not
    PPI does not
    include class information at all.
    include class information at all.
    doi:10.18129/B9.bioc.TDbasedUFE
    doi:10.18129/B9.bioc.TDbasedUFE
    doi:
    doi:10.18129/B9.bioc.TDbasedUFEadv
    10.18129/B9.bioc.TDbasedUFEadv
    Analyses with TD
    can be performed in
    two bioconductor
    packages by myself

    View full-size slide