ISAIC2020 1
Principal Component Analysis, Tensor Decomposition, and
Kernel Tensor Decomposition Based Unsupervised Feature
Extraction Applied to Bioinformatics
Y-h. Taguchi,
Department of Physics,
Chuo University,
Tokyo 112-8551, Japan
Slide 2
Slide 2 text
ISAIC2020 2
Singular value decomposition (SVD)
xij
N
M
(uli)T
N
L
vlj
L
M
⨉
≈
x
ij
≃∑
l=1
L
u
li
λl
v
l j
L
L
⨉ λl
N: number of genes (i)
M: number of samples (j)
xij: gene expression
Example
Slide 3
Slide 3 text
ISAIC2020 3
Interpretation…..
j:samples
Healthy
control
Patients
vlj
i:genes
uli
DEG: Differentially Expressed Genes
For some specific l
Healthy controls < Patients
DEG:
DEG:
Healthy controls > Patients
Slide 4
Slide 4 text
ISAIC2020 4
x
ijk
G
u
l1i
u
l2j
u
l3k
L1
L2
L3
HOSVD (Higher Order Singular Value Decomposition)
Extension to tensor…..
N
M
K
x
ijk
≃∑
l
1
=1
L
1 ∑
l
2
=1
L
2 ∑
l
3
=1
L
3 G(l
1
l
2
l
3
)u
l
1
i
u
l
2
j
u
l
3
k
N: number of genes (i)
M: number of samples (j)
K: number of tissues (k)
xijk: gene expression
Example
Slide 5
Slide 5 text
ISAIC2020 5
Interpretation…..
j:samples
Healthy
control
Patients
ul2j
For some specific l2
For some specific l3
k:tissues
Tissue specific expression
ul3k
Slide 6
Slide 6 text
ISAIC2020 6
i:genes
ul1i
tDEG:
tissue specific Differentially Expressed Genes
Healthy controls < Patients
tDEG:
tDEG:
Healthy controls > Patients
For some specific l1 with max |G(l1l2l3)|
If G(l1l2l3)>0 Fixed
Slide 7
Slide 7 text
ISAIC2020 7
Extension to Kernel Trick
SVD → Principal Component Analysis (PCA)
N
M
(uli)T
N
L
vlj
L
M
⨉
≈
L
L
⨉ λl
xij
xij’
N
M
⨉
xjj’
M
M
= (vlj)T
M
L
≈
L
L
⨉ λ2
l
vlj’
L
M
⨉
Slide 8
Slide 8 text
ISAIC2020 8
Kernel Trick
x
jj’
→ k(x
ij
,x
ij’
):non-negative definite
k (x
ij
, x
ij '
)=exp(−α∑i
(x
ij
−x
ij '
)2)
Radial base function (RBF) kernel
k (x
ij
, x
ij '
)=(1+∑
i
x
ij
x
ij '
)
d
Polynomial kernel
k(x
ij
,x
ij’
)→ diagonalization
Slide 9
Slide 9 text
ISAIC2020 9
x
jkj ' k '
≃∑
l
1
=1
L
1 ∑
l
2
=1
L
2 ∑
l
3
=1
L
3 ∑
l
4
=1
L
4 G(l
1
l
2
l
3
l
4
)u
l
1
ij
u
l
2
k
u
l
3
j'
u
l
4
k '
Kernel Tensor decomposition
x
ijk
G
u
l1i
u
l2j
u
l3k
L1
L2
L3
N
M
K
x
ij’k’
N
M
K
⨉
x
jkj’k’
=
G
u
l3j’
u
l1j
u
l2k
L3
L1
L2
u
l4k’
L4
x
jkj ' k '
=∑
i
x
ijk
x
ij' k '
https://doi.org/10.1101/2020.10.09.333195
https://doi.org/10.1101/2020.10.09.333195
Slide 10
Slide 10 text
ISAIC2020 10
Kernel Trick
x
jkj’k’
→ k(x
ijk
,x
ij’k’
):non-negative definite
k (x
ijk
, x
ij ' k '
)=exp(−α∑i
( x
ijk
−x
ij ' k '
)2)
Radial base function kernel
k (x
ijk
, x
ij ' k '
)=(1+∑
i
x
ijk
x
ij ' k '
)
d
Polynomial kernel
k(x
ijk
,x
ij’k’
)→ tensor decomposition
Slide 11
Slide 11 text
ISAIC2020 11
Synthetic example:Swiss Roll
x
ijk
∈ℝ1000×3×10
⨉ 10
Number of points Spatial dimension
Slide 12
Slide 12 text
ISAIC2020 12
SVD applied to single Swiss Roll
Slide 13
Slide 13 text
ISAIC2020 13
TD applied to a bundle of 10 Swiss Rolls
Slide 14
Slide 14 text
ISAIC2020 14
Kernel TD (with RBF) applied to a bundle
of 10 Swiss Rolls
Slide 15
Slide 15 text
ISAIC2020 15
Large p small n problem
N(μ,σ): normal distribution,
μ:mean,σ:standard deviation
N genes
1st M experiments
2nd M experiments
M2 samples
x
ijk
∈ℝN ×M ×M
∼
N(μ,3) j ,k≤
M
2
,i≤N
1
≪N
N(0,3) otherwise
M << N
Slide 16
Slide 16 text
ISAIC2020 16
N genes
N
1
M experiments
M/2
M experiments
Zero mean
Non-zero mean
i≦N
1
:distinct between j,k≦M/2 and others
i>N
1
: no distinction
Task: Can we get latent vectors coincident
with distinction?
Slide 17
Slide 17 text
ISAIC2020 17
N=103, N
1
=10, μ=2,σ=1,M=6
P-values attributed to correlation coefficients between
distinction and latent variables (smaller is better)
KTD with linear is equivalent to that with RBF kernel.
0.043
0.043 0.039
0.039
Slide 18
Slide 18 text
ISAIC2020 18
Application to real data :
SARS-CoV-2 infection
Slide 19
Slide 19 text
ISAIC2020 19
Slide 20
Slide 20 text
ISAIC2020 20
x
i jk m
∈ℝ21797×5×2×3
Data sets GSE147507
Three kinds of human lung cell lines infected by SARS-CoV-2
i:genes(21797)
j: j=1:Calu3, j=2: NHBE, j=3:A549 MOI:0.2, j=4:
A549 MOI 2.0, j=5:A549 ACE2 expressed
(MOI:Multiplicity of infection)
k: k=1: Mock, k=2:SARS-CoV-2 infected
m: three biological replicates
Slide 21
Slide 21 text
ISAIC2020 21
x
i jk m
≃∑
l
1
=1
L
1
∑
l
2
=1
L
2
∑
l
3
=1
L
3
∑
l
4
=1
L
4
G(l
1
l
2
l
3
l
4
)u
l
1
j
u
l
2
k
u
l
3
m
u
l
4
i
u
l1j
: l
1
th cell line dependence
u
l2k
: l
2
th SARS-CoV-2 infection YES/NO
u
l3m
: l
3
th biological replicate dependence
u
l4i
: l
4
th gene dependence
G: weights
Purpose: identification of l
1
,l
2
,l
3
independent of
cell lines or replicates (u
l1j
and u
l3m
are constant
independent of j,m)whereas dependent upon
SARS-CoV-2 infection(u
l21
=-u
l22
)
Slide 22
Slide 22 text
ISAIC2020 22
l
1
=1 l
2
=2
l
3
=1
Cell lines
SARS-CoV-2
Yes or Not
biological replicate
Independent of cell lines
or biological replicate but
depedent upon SARS-
CoV-2 infection
Slide 23
Slide 23 text
ISAIC2020 23
Kernel TD with RBF is more distinct between
Kernel TD with RBF is more distinct between
normal and infected cell lines than TD
normal and infected cell lines than TD
infection
Not infection
Slide 24
Slide 24 text
ISAIC2020 24
l
1
=1 l
2
=2 l
3
=1
u
5i
(l
4
=5) represents gene expression profiles
independent of cell lines or biological replicate
but altered by SARS-CoV-2 infection
Which l
4
has largest |G|?
ISAIC2020 26
Many human genes known to be interacted with SARS-CoV-2
proteins are included
Slide 27
Slide 27 text
ISAIC2020 27
Many human genes that are inportant when SARS-CoV-
2 infects seem to be detected.
↓
Drug repositioning will be possible by identifying
compounds that affect selevcted 163 genes
↓
Fortunately, we have data bases that list genes whose
expression is altered with drug treatments
Slide 28
Slide 28 text
ISAIC2020 28
Many reported SARS-CoV-2 taregetting candidate drugs are detected
Slide 29
Slide 29 text
ISAIC2020 29
Term Overlap P-value Adjusted P-value
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-1d-dn 12/277 2.98E-06 9.93E-06
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-5d-dn 12/289 4.60E-06 1.44E-05
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-3d-dn 11/285 2.29E-05 5.56E-05
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-1d-up 10/323 3.28E-04 5.39E-04
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-5d-up 8/311 4.06E-03 5.10E-03
Ivermectin-7.5 mg/kg in CMC-Rat-Liver-3d-up 8/315 4.38E-03 5.46E-03
Ivermectin was hit!
DrugMatrix in Enrichr
Enrichr
Slide 30
Slide 30 text
ISAIC2020 30
Summary
Our methods can identify effective
candidate compounds for COVID-19.
I have published a mono graph from Springer
international at Sep 2019.