Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Drug candidate identifcation based on gene expression of treated cells
using tensor decomposition-based unsupervised feature extraction for large-scale data Y-h. Taguchi Department of Physics, Chuo University, Tokyo, Japan InCob2018, New Delhi, 25th Sep. 2018 Accepted in BMC Bioinformatics

Drug discovery (DD) = Dose dependence Dose (density) Efect No
high throughput (HT) methods are available ←→ gene expression = HT sequencing/mincroarry Is it possoble HT DD from HT gene expression methods?

Data are available (LINCS) Multiple cancer cell lines are treated
with various drugs with multiple dose density → Problem: How can we screen these? Regression analysis between gene expression and dose density? → Too small observations (a few dose density) might prevent us from obtaining signifcant P- values after correcting P-values with considering multiple comparisons. → How about unsupervised methods?

N features Categorical multiclasses PCA PC1 samples PC Loadings M
samples N × M Matrix X (numerical values) PC2 PC1 PC Score features + + + + + + + + + + + + + + + No distinction Between classes PCA, but embedding features instead of samples into lower dim.

Synthetic example 10 samples 10 samples 90 features 10 features
N(0,1/2) N(μ,1/2) [N(m,1/2)+N(0,1/2)]/2 +:Top 10 outliers m=2 Thus, extracting outliers selects features distinct between two classes in an unsupervised way. Accuracy:(100 trials) Accuracy:(100 trials) 89.5% (m=2) 52.6% (m=1) PC1 PC2 Normal μ：mean Distribution ½ :SD

By extending matrix to tensor, x ijl ,we can deal
with data of “dose density(i) ⨉ compounds(j) ⨉ gene(l)” → Tensors can be decomposed. x ijl G x k1i x k2j x k3l x ijl ≒Σ k1,k2,k3 G k1,k2,k3 x k1i x k2j x k3l gene compounds dose density compounds dose density gene

Dose density Genes Compounds Genes 2nd Component k2£6 Compounds Genes
xijl Gk1 ,k2 ,k3 x k3 l x k1 i x k2 j Dose density Outlier compounds Outlier genes x k2 j x k3 l G2,k2 ,k3 x k3 l x k2 j Compounds k3£6

A compounds Genes Single gene perturbation Gene A Gene B
Gene C TD based unsupervised FE A B C B C

Gene expression profles with drug compounds treatments Identifcation of pairs
of genes and compounds with dose dependence by tensor decomposition Target proteins identifcation by the comparisons with single gene KO/KI experiments Validation by the comparison with known drug target proteins by Fisher’s exact test, Over all data analysis flow

Results for 13 cancer cell lines (LINCS) Identification by tensor
decomposition Target protein by the comparison with KO/KI experiments ( )

Evaluations Comparisons with drug2gene.com and DsigDB ◦: signifcant overlap by
Fisher’s exact test (1)-(13): Cancer cell lines in the previous table

Conclusions We have developed tensor decomposition based method that can
identify genes associated with dose denpendent gene expression profles based upon drug compounds treated gene expression profles. Drug target proteins are further infered by the comparisons with single gene KO/KI expressions. The results are signifcantly overlaped with known drug taregt proteins.

Drug candidate identifcation based on gene expr...

Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Y-h. Taguchi PRO

More Decks by Y-h. Taguchi

Featured

Transcript

Drug candidate identifcation based on gene expression of treated cells

Drug discovery (DD) = Dose dependence Dose (density) Efect No

Data are available (LINCS) Multiple cancer cell lines are treated

N features Categorical multiclasses PCA PC1 samples PC Loadings M

Synthetic example 10 samples 10 samples 90 features 10 features

By extending matrix to tensor, x ijl ,we can deal

Dose density Genes Compounds Genes 2nd Component k2£6 Compounds Genes

A compounds Genes Single gene perturbation Gene A Gene B

Gene expression profles with drug compounds treatments Identifcation of pairs

Results for 13 cancer cell lines (LINCS) Identification by tensor

Evaluations Comparisons with drug2gene.com and DsigDB ◦: signifcant overlap by

Conclusions We have developed tensor decomposition based method that can