Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MATHEMATICAL FORMULATION AND APPLICATION OF KERNEL TENSOR DECOMPOSITION BASED UNSUPERVISED FEATURE EXTRACTION

Y-h. Taguchi
February 15, 2021

MATHEMATICAL FORMULATION AND APPLICATION OF KERNEL TENSOR DECOMPOSITION BASED UNSUPERVISED FEATURE EXTRACTION

Seminar at IEEE Bangalore, 15th Feb 2021
https://events.vtools.ieee.org/m/260534
link to published paper
http://doi.org/10.1016/j.knosys.2021.106834

Y-h. Taguchi

February 15, 2021
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. 1
    MATHEMATICAL FORMULATION AND APPLICATION
    OF KERNEL TENSOR DECOMPOSITION BASED
    UNSUPERVISED FEATURE EXTRACTION
    Y-h. Taguchi
    Department of Physics, Chuo University
    Tokyo, Japan
    Published in Knowledge-based systems (IF=5.9)
    https://doi.org/10.1016/j.knosys.2021.106834

    View full-size slide

  2. 2
    Purpose:
    Identification of small number of critical variables within large
    number of variables (=p) based upon small number of samples (=n).
    (so called “large p small n” problem)
    → difficult because …..
    Statistical test:
    Small n→ not small enough (not significant enough) P-values
    Large p→ strong multiple comparison correction (corrected P-
    values take larger values)
    → No significant p-values at all.

    View full-size slide

  3. 3
    More advanced machine learning approach:
    (e.g., lasso, random forest)
    “large p small n”→ overfitting….
    Too optimized selection toward a specific set of small number n
    results in “sample specific-variable selection”
    → Other set of variables will be selected if using another set of
    small number of samples (n) is used.

    View full-size slide

  4. 4
    Try synthetic example of (p>>n, i.e. p/n ~ 10
    Try synthetic example of (p>>n, i.e. p/n ~ 102
    2)
    )

    View full-size slide

  5. 5
    N variables
    N
    1
    M measurements
    M/2
    M measurements
    Gaussian
    Zero mean
    Gaussian
    Non-zero mean
    M2 samples
    /variable
    i≦N
    1
    :distinct between j,k≦M/2 and others
    i>N
    1
    : no distinction
    Task: Can we identify N
    1
    variables correctly?

    View full-size slide

  6. 6
    Strategy 1
    ● Apply t test to individual variables to test if it is distinct
    between two classes (i.e. j,k≦M/2 vs others)
    ● Computed P-values are corrected
    with considering multiple comparison
    corrections by Benjamini-Hochberg method.
    ● Variables with corrected P-values <0.05 are selected.
    j
    k
    M
    M/2
    M/2

    View full-size slide

  7. 7
    i > N
    1
    i ≦ N
    1
    P>0.05 989.3 3.4
    P≦0.05 0.7 6.6
    N=103, N
    1
    =10, M=6, Gaussian dist. μ(mean)=2, σ(SD)=1
    Averaged over 100 independent trials.
    Fact
    N P
    Prediction N TN FN
    P FP TP
    Fact
    N P
    Prediction N 990 0
    P 0 10
    Matthew’s correlation coefficient (MCC)
    (TP⨉TN)-(FN⨉FP)
    (TN+FP)(FN+TP)(TN+FN)(RP+TP)
    ~ 0.77

    View full-size slide

  8. 8
    Lasso (N
    1
    =10 given, since no P-value computations)
    i > N
    1
    i ≦ N
    1
    P>0.05 989.4 2.4
    P≦0.05 0.6 7.6
    MCC ~ 0.84
    Random Forest (N
    1
    =10 given, since no P-value computations)
    i > N
    1
    i ≦ N
    1
    P>0.05 988.2 1.8
    P≦0.05 1.8 8.2
    MCC ~ 0.81

    View full-size slide

  9. 9
    Singular value decomposition (SVD)
    xij
    N
    M
    (uli)T
    N
    L
    vlj
    L
    M


    x
    ij
    ≃∑
    l=1
    L
    u
    li
    λl
    v
    l j
    L
    L
    ⨉ λl

    View full-size slide

  10. 10
    x
    ijk
    G
    u
    l1i
    u
    l2j
    u
    l3k
    L1
    L2
    L3
    HOSVD (Higher Order Singular Value Decomposition)
    Extension to tensor…..
    N
    M
    K
    x
    ijk
    ≃∑
    l
    1
    =1
    L
    1 ∑
    l
    2
    =1
    L
    2 ∑
    l
    3
    =1
    L
    3 G(l
    1
    l
    2
    l
    3
    )u
    l
    1
    i
    u
    l
    2
    j
    u
    l
    3
    k

    View full-size slide

  11. 11
    N variables
    N
    1
    M measurements
    M/2
    M measurements
    Gaussian
    Zero mean
    Gaussian
    Non-zero mean
    M2 samples
    /variable
    x
    ijk
    ≃∑
    l
    1
    =1
    L
    1 ∑
    l
    2
    =1
    L
    2 ∑
    l
    3
    =1
    L
    3 G(l
    1
    l
    2
    l
    3
    )u
    l
    1
    i
    u
    l
    2
    j
    u
    l
    3
    k

    View full-size slide

  12. 12
    j k i
    u
    1j
    u
    1k
    u
    1i
    i ≦ N
    1

    View full-size slide

  13. 13
    u
    1i
    u
    1i
    i ≦ N
    1

    View full-size slide

  14. 14
    P
    i
    =P
    χ2
    [>
    (u
    1i
    σ1
    )2]
    - log
    10
    P
    i
    Assuming that u
    1i
    obey Gaussian (null hypothesis), P-values are
    attributed to individual variables (i) using χ2 distribution
    - log
    10
    P
    i
    i ≦ N
    1

    View full-size slide

  15. 15
    Adjusted P
    i
    <0.05 are selected
    i > N
    1
    i ≦ N
    1
    P>0.05 989.9 2.2
    P≦0.05 0.1 7.8
    MCC ~ 0.88
    t test MCC ~ 0.77
    lasso MCC ~ 0.84
    Random forest MCC ~ 0.81

    View full-size slide

  16. 16
    We named this strategy as “TD (tensor decomposition) based
    unsupervised FE (feature extraction)”, which was in detail
    described in my recently published book.
    Unsupervised Feature extraction applied
    to Bioinformatcs,
    2020, Springer international.

    View full-size slide

  17. 17
    Advantages of TD based unsupervised FE,
    Advantages of TD based unsupervised FE,
    1) It is very fitted to feature selection problems in “large p small
    n” problem.
    2) In contrast to conventional feature selection methods (e.g.,
    lasso and random forest) no knowledge about the number of
    selected variables is required. Variables can be selected using P-
    values like conventional statistical test.

    View full-size slide

  18. 18
    3) In contrast to conventional statistical tests (e.g., t test), it work
    in “large p small n” problems, at least, comparative with
    conventional feature selections that require the number of
    variables selected.
    4) TD based unsupervised FE is unsupervised method, since it
    does not require knowledge about classes or labeling when
    singular value vectors (u
    l1i
    , u
    l2j
    , u
    l3k
    ) are generated.
    MCC ~ 0.88
    t test MCC ~ 0.77
    x
    ijk
    ≃∑
    l
    1
    =1
    L
    1 ∑
    l
    2
    =1
    L
    2 ∑
    l
    3
    =1
    L
    3 G(l
    1
    l
    2
    l
    3
    )u
    l
    1
    i
    u
    l
    2
    j
    u
    l
    3
    k

    View full-size slide

  19. 19
    Application to a real example
    Application to a real example

    View full-size slide

  20. 21
    Data set GSE147507
    Gene expression of human lung cell lines with/without SARS-CoV-2
    infection.
    i:genes(21797)
    j: j=1:Calu3, j=2: NHBE, j=3:A549 MOI:0.2, j=4:
    A549 MOI 2.0, j=5:A549 ACE2 expressed
    (MOI:Multiplicity of infection)
    k: k=1: Mock, k=2:SARS-CoV-2 infected
    m: three biological replicates

    View full-size slide

  21. 22
    x
    i jk m
    ∈ℝ21797×5×2×3
    x
    i jk m
    ≃∑
    l
    1
    =1
    L
    1

    l
    2
    =1
    L
    2

    l
    3
    =1
    L
    3

    l
    4
    =1
    L
    4
    G(l
    1
    l
    2
    l
    3
    l
    4
    )u
    l
    1
    j
    u
    l
    2
    k
    u
    l
    3
    m
    u
    l
    4
    i
    u
    l1j
    : l
    1
    th cell lines dependence
    u
    l2k
    : l
    2
    th with and without SARS-CoV-2 infection
    u
    l3m
    : l
    3
    th dependence upon biological replicate
    u
    l4i
    : l
    4
    th gene dependence
    G: weight of individual terms

    View full-size slide

  22. 23
    Purpose: identification of l
    1
    ,l
    2
    ,l
    3
    independent of cell
    lines and biological replicates (u
    l1j
    ,u
    l3m
    take constant
    regardless j,m) and dependent upon with or wothout
    SARS-CoV-2 infection(u
    l21
    =-u
    l22

    Heavy “large p small n” problem
    Number of variables(=p): 21797 ~ 104
    Number of samples (=n): 5 ⨉2 ⨉3 =30 ~10
    p/n ~ 103

    View full-size slide

  23. 24
    l
    1
    =1 l
    2
    =2
    l
    3
    =1
    Cell lines With and without
    SARS-CoV-2
    infection
    biological
    replicate
    Independent of cell lines
    and biological replicate,
    but dependent upon
    SARS-CoV-2 infection.

    View full-size slide

  24. 25
    l
    1
    =1 l
    2
    =2 l
    3
    =1
    |G|is the largest in which l
    4

    View full-size slide

  25. 26
    Gene expression independent of cell lines and
    biological replicate, but dependent upon SARS-CoV-2
    infection is associated with u
    5i
    (l
    4
    =5)
    P
    i
    =P
    χ2
    [>
    (u
    5i
    σ5
    )2]
    Computed P-values are corrected with considering multiple
    comparison corrections by Benjamini-Hochberg method.
    163 genes with corrected P-values <0.01 are selected among 21,797
    genes.

    View full-size slide

  26. 27
    Multiple hits with known SARS-CoV-2 interacting human genes

    View full-size slide

  27. 28
    Comparisons with conventional methods:
    Comparisons with conventional methods:
    Since we do not know how many genes should be selected, lasso and
    random forest is useless. Instead we employed SAM and limma, which
    are gene selection specific algorithm (adjusted P-values are used ).
    t test SAM limma
    P>0.01 P≦0.01 P>0.01 P≦0.01 P>0.01 P≦0.01
    Calu3 21754 43 21797 0 335 3789
    NHBE 21797 0 21797 0 342 3906
    A549
    MOI 0.2 21797 0 21797 0 319 4391
    MOI 2.0 21472 325 21797 0 208 4169
    ACE2 expressed 21796 1 21797 0 182 4245

    View full-size slide

  28. 29
    Kernelization of TD based unsupervised FE
    Kernelization of TD based unsupervised FE

    View full-size slide

  29. 30
    Published in Knowledge-based systems (IF=5.9)
    https://doi.org/10.1016/j.knosys.2021.106834

    View full-size slide

  30. 31
    Kernel Tensor decomposition
    x
    ijk
    G
    u
    l1i
    u
    l2j
    u
    l3k
    L1
    L2
    L3
    N
    M
    K
    x
    ij’k’
    N
    M
    K

    x
    jkj ' k '
    =∑
    i
    x
    ijk
    x
    ij' k '
    (Linear kernel)

    View full-size slide

  31. 32
    x
    jkj ' k '
    ≃∑
    l
    1
    =1
    L
    1 ∑
    l
    2
    =1
    L
    2 ∑
    l
    3
    =1
    L
    3 ∑
    l
    4
    =1
    L
    4 G(l
    1
    l
    2
    l
    3
    l
    4
    )u
    l
    1
    j
    u
    l
    2
    k
    u
    l
    3
    j'
    u
    l
    4
    k '
    x
    jkj’k’
    G
    u
    l3j’
    u
    l1j
    u
    l2k
    L3
    L1
    L2
    u
    l4k’
    L4
    Kernel Trick
    x
    jkj’k’
    → k(x
    ijk
    ,x
    ij’k’
    ):non-negative definite

    View full-size slide

  32. 33
    k (x
    ijk
    , x
    ij ' k '
    )=exp(−α∑i
    ( x
    ijk
    −x
    ij ' k '
    )2)
    Radial base function kernel
    k (x
    ijk
    , x
    ij ' k '
    )=(1+∑
    i
    x
    ijk
    x
    ij ' k '
    )
    d
    Polynomial kernel
    k(x
    ijk
    ,x
    ij’k’
    )→ tensor decomposition

    View full-size slide

  33. 34
    Synthetic example:Swiss Roll
    x
    ijk
    ∈ℝ1000×3×10
    ⨉ 10
    Number of points (=n) Spatial dimension (=p)

    View full-size slide

  34. 35
    SVD applied to single Swiss Roll

    View full-size slide

  35. 36
    TD applied to a bundle of 10 Swiss Rolls

    View full-size slide

  36. 37
    Kernel TD
    (with RBF)
    applied to
    a bundle of
    10 Swiss Rolls

    View full-size slide

  37. 38
    Feature selection
    Feature selection
    Linear Kernel:
    x
    jkj’k’
    → u
    l1j
    , u
    l2k
    u
    l
    1 i
    ∝∑
    jk
    x
    ijk
    u
    l
    1
    j
    u
    l
    2
    k
    P
    i
    =P
    χ2
    [>
    (u
    l
    1
    i
    σl
    1
    )2]
    Computed P-values are corrected with considering multiple
    comparison corrections by Benjamini-Hochberg method.
    Features with corrected P-values <0.01 are selected.
    TD

    View full-size slide

  38. 39
    RBF, Polynomial Kernels
    Exclusion of a specific i
    i
    Recompute x
    jkj’k’
    x
    jkj’k’
    → u
    l1j
    ⨉ u
    l2k
    TD
    Estimate coincidence between u
    l1j
    , u
    l2k
    and classification of (k,j)
    Rank i
    i based upon the amount of decreased coincidence
    u
    l1j
    ⨉ u
    l2k
    k

    View full-size slide

  39. 40
    Application to SARS-CoV-2 data set
    Applying RBF kernel and select 163 top ranked genes.
    TD KTD

    View full-size slide

  40. 41
    Conclusions
    TD based unsupervised FE is specialized to feature selections in
    “large p small n”
    It can work comparatively with conventional feature selections
    (lasso, random forest) and can give us P-values that lasso and
    random forest cannot.
    TD based unsupervised FE could select human genes related
    with SARS-CoV-2 infection even when other conventional gene
    selection methods (t test, SAM, limma) cannot work well.

    View full-size slide

  41. 42
    TD based unsupervised FE was successfully “kernelized”.
    Kernel TD (KTD) based unsupervised FE could even
    outperform TD based unsupervised FE when it was applied to
    identification human genes related to SARS-CoV-2 infection.
    Other advanced KTD based unsupervised FE is expected to
    develop to attack more wide range of problems including
    genomic science/bioinformatics.

    View full-size slide