Tensor decomposition based unsupervised feature extraction applied to drug discovery from gene expression analysis

Tensor decomposition based unsupervised feature extraction applied to drug discovery
from gene expression analysis Y-h. Taguchi Department of Physics, Chuo University, Tokyo, Japan This slide can be obtained from this QR code

Three Reseach Projects Project 1: Inference drugs effective to cancers
from gene expression profiles of drug treated cancer cell lines Project 2: Integrated analysis of gene expression profiles of drug treated tissues and human patients Project 3: Inference of drug efective to Alzheimer disease of mice brain single cell gene expression (without drug treated gene expression)

Project 1: Inference drugs effective to cancers from gene expression
profiles of drug treated cancer cell lines

Drug candidate identification based on gene expression of treated cells
using tensor decomposition-based unsupervised feature extraction for large-scale data Y-h. Taguchi BMC Bioinformatics volume 19, Article number: 388 (2019) OPEN ACCESS

Drug discovery (DD) = Dose dependence Dose (density) Effect No
high throughput (HT) methods are available ←→ gene expression = HT sequencing/mincroarry Is it possible HT DD from HT gene expression methods?

By extending matrix to tensor, x ijl ,we can deal
with data of “dose density(i) ⨉ compounds(j) ⨉ gene(l)” → Tensors can be decomposed. x ijl G u k1i u k2j u k3l x ijl ≒ΣΣ k1,k2,k3 G k1,k2,k3 u k1i u k2j u k3l gene compounds dose density compounds dose density gene

Dose density Genes Compounds Genes 2nd Component k2£6 Compounds Genes
xijl Gk1 ,k2 ,k3 u k3 l u k1 i u k2 j Dose density Outlier compounds Outlier genes u k2 j u k3 l G2,k2 ,k3 u k3 l u k2 j Compounds k3£6

A compounds Genes Single gene perturbation Gene A Gene B
Gene C TD based unsupervised FE A B C B C

Gene expression profiles with drug compounds treatments Identification of pairs
of genes and compounds with dose dependence by tensor decomposition Target proteins identification by the comparisons with single gene KO/KI experiments Validation by the comparison with known drug target proteins by Fisher’s exact test, Over all data analysis flow

Results for 13 cancer cell lines (LINCS) Identification by tensor
decomposition Target protein by the comparison with KO/KI experiments ( )

Evaluations Comparisons with drug2gene.com and DsigDB ◦: significant overlap by
Fisher’s exact test (1)-(13): Cancer cell lines in the previous table

Project 2: Integrated analysis of gene expression profiles of drug
treated tissues and human patients

Identification of candidate drugs using tensor-decomposition- based unsupervised feature extraction
in integrated analysis of gene expression between diseases and DrugMatrix datasets Y.-h. Taguchi Scientific Reports volume 7, Article number: 13733 (2017) OPEN ACCESS

× x ij x il x ij ×x il x
ijl Tensor decomposition G u k1i u k2j u k3l x ijl =x ij ×x il ≒ΣΣ k1,k2,k3 G k1,k2,k3 u k1i u k2j u k3l i:genes j:Patients vs healthy contol l：dose density Patients vs healthy contol Dose density

x j 3 i x j 1 j 2 j
3 i =x j 1 j 2 i x j 3 i =∑G(l 1 ,l 2 ,l 3 ,l 4 )u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 i x j 1 j 2 i u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 i j 1 j 2 j 3 i Compounds Time dulation after treatment Patients vs Health control gene Gene X Target protein

days Dulation after treatment： The 1st to 4th singlar value
vectors

Heart disease The 1st to 3rd singular value vectors

Compounds：the 2nd singular value vectors

Top ranked 10 vectors with larger absolute values l 1
=2

Feature extraction Feature extraction No real data separated well Assume
Gaussian Detect outliers P i =P[ >∑ k ( x ik σ ) 2 ] Benjamini-Hochberg corrected P <0.01 P-values by χ2 dist P(p) 1-p 0 1

274 genes are selected Akt1 A2m Abcb10 Acads Accn3 Acox3
Acsl1 Acta1 Actg2 Actn1 Actr1b Acvr1c Adcy3 Adora3 Adra1b Adrb2 Agpat1 Agrn Ak3 Akap1 Alas1 Amhr2 Anxa2 Aoc3 Apob Apod Aqp4 Areg Arf4 Atf3 Atp1b1 Atp5a1 Atp6v1h Azgp1 B4galt7 Bag3 Bmpr1a Bpgm Btbd9 Btg2 Bves Bzw1 C1qa C3 Ca3 Canx Cast Ccl2 Ccnd2 Ccnl1 Ccr1 Cd36 Cd63 Cd74 Cdh23 Cebpg Ces1 Chchd4 Chdh Ciapin1 Cmklr1 Col5a1 Cryab Csda Csnk2b Csrp3 Ctf1 Cyb5b Cybb Dcps Ddit4l Dhrs1 Dlc1 Dnajc5 Dpp4 Dusp11 Ednra Eef2k Egfr Egr1 Eif2s2 Eif4a1 Ephx1 F8 Fabp5 Fbl Fbxo22 Fgf9 Flt1 Fndc5 Fos Fstl1 Fut8 Fyttd1 Gapdh Gatm Gbe1 Ghr Git2 Glul Gna12 Gnb1 Gnb2l1 Gnb3 Gosr1 Got1 Got2 Gpx3 Grwd1 Gstp1 Gucy1a3 Hapln1 Hmbs Hmgb1 Hrc Hspa5 Htr4 Idh3a Idh3g Il1rl1 Il6r Il6st Immt Ing4 Itga6 Itm2c Itpr3 Junb Kcmf1 Kcnj8 Kcnk3 Kcnt1 Kpna1 Lactb2 Laptm4b Lcat Lcp1 Ldha Lphn3 Lrp1 Lss Ltbp4 Man2c1 Map2k4 Map4k3 Mapk10 Mapk14 Mapk6 Mapk9 Mfn2 Mgat3 Mgp Mknk2 Mlycd Mme Mpp3 mrpl9 Msn Mterf Mtus1 Mvd Mxd3 Myc Myl2 Ncoa2 Ndfip1 Ndufs2 Nedd4l Nes Nexn Nf1 Nfatc4 Nfe2l2 Npr3 Nr0b2 Nr3c1 Nr3c2 Nsf Obscn Odz2 P2rx3 Pacsin2 Pccb Pdcl3 Pde4b Pdia4 Pdk2 Pdk4 Pdrg1 Pdxk Pggt1b Pi4k2a Pold4 Ppara Ppif Ppm1a Ppm1b Ppp1r14a Ppp1r14c Ppp2ca Ppp2r2d Prelp Psmb4 Psmc1 Psmd12 Ptgds Ptger2 Pvrl2 Pycr2 Rab15 Ramp2 Rbm10 Rela Rhoa Rplp1 Rps18 Rps20 Rps6 Rxrg Samm50 Sccpdh Schip1 Scn2b Sdhd Sephs2 Serpinh1 Sfrp4 Sharpin Sirt5 Slc25a4 Slc2a4 Slc38a2 Slc40a1 Slc5a1 Slc6a1 Sln Slpi Smad4 Smpd1 Sod1 Sox18 Spin2b Spp1 Stat3 Steap3 Stip1 Stx7 Suclg1 Synj1 Tarbp2 Tfam Tmem30a Tnfaip6 Tnfrsf12a Tnfrsf1a Tnni3 Tpm1 Tpsab1 Trpc4ap Ttn Txndc12 Txnip Uchl1 Uqcrc2 Usp14 Vdac2 Vezt Vim Vsnl1 Vtn Wbp4 Yme1l1 Ywhae Ywhah → Based upon gene ＫＯ experiments, ５５６(up)/449(down) genes are selected

Amitriptyline Atropine Baclofen Bezafibrate Caffeine Calcitriol Chlorambucil Cimetidine Citalopram Clemastine
Clonazepam Cyclophosphamide D-Tubocurarine Chloride Dexamethasone Dexchlorpheniramine Digitoxin Diphenhydramine Doxazosin Ebastine Fenofibrate Fluphenazine Gabapentin Ifosfamide Iproniazid Lacidipine Loratadine Nevirapine Nimodipine Nitrendipine Ofloxacin Oxymetazoline Paroxetine Phenacemide Phenytoin Rosiglitazone Sparteine Stavudine Valsartan Vecuronium Bromide Venlafaxine Vinblastine Vincristine Zidovudine 43 compounds

Evaluation by SwissDock　Chrosis HnF4a vs Bezafibrate K i = 0.13μMM

CYPOR vs Bezafibrate K i = 79nM Evaluation by SwissDock　Chrosis

Yin et al, “Systematic review and meta-analysis: bezafibrate in patients
with primary biliary cirrhosis”, Drug Des Devel Ther. 2015 ;9:5407-19. CONCLUSION: Combination therapy improved liver biochemistry and the prognosis of PBC, but did not improve clinical symptoms or incidence of death. Attention should be paid to adverse events when using bezafibrate.

Project 3: Inference of drug efective to Alzheimer disease of
mice brain single cell gene expression (without drug treated gene expression)

Neurological Disorder Drug Discovery from Gene Expression with Tensor Decomposition
Author(s): Y-h. Taguchi*, Turki Turki. Journal Name: Current Pharmaceutical Design Volume 25 , Issue 43 , 2019 OPEN ACCESS

Data & experiments (mice) Two genotypes (APP_NL-F-G and C57Bl/6), Two
tissues (Cortex and Hippocampus), Four ages (3, 6, 12, and 21 weeks), Two sexes (male and female) Four 96 well plates (the number of cells). Aim: Understanding Alzheimer’s disease

Tensor x ij1j2j3j4j5j6 represents gene expression of ith gene of
j 1 th cell (well) j 2 th genotype (j 2 = 1:APP_NL-F-G and j 2 = 2: C57Bl/6), j 3 th tissue (j 3 = 1:Cortex and j 3 = 2:Hippocampus), j 4 th age (j 4 = 1: three weeks,j 4 = 2: six weeks, j 4 = 3: twelve weeks, and j 4 = 4: twenty one weeks), j 5 th sex (j5 = 1:female and j5 = 2:male) j 6 th plate.

x i j 1 j 2 j 3 j 4
j 5 j 6 =∑ l 1 l 2 l 3 l 4 l 5 l 6 l 7 G(l 1 ,l 2 ,l 3 ,l 4 ,l 5 ,l 6 ,l 7 ) ×u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 j 4 u l 5 j 5 u l 6 j 6 u l 7 i (A) u l1j1 :96 wells (cells), l 1 =1 (B) u l2j2 : genotype APP_NL-F-G vs C57Bl/6, l 2 =1 (C) u l3j3 : Cortex vs Hippocampus, l 3 =1 (D) u l4j4 : 3, 6, 12, 21 weeks , l 4 =2 (E) u l5j5 : female vs male, l 5 =1 (F) u l6j6 : 4 plates , l 1 =1 → l 7 =2 with G(1,1,1,2,1,1,l 7 ) (the largest absolute values)

P i =P χ2 [( u 2i σ2 ) 2
] Attributing P-values to genes After correcting P-values by BH criterion, 401 genes with P i <0.01 are selected. → Evaluate how these are overlapped with genes affected by known Alzheimer’s drug treatments. 401 genes are uploaded to Enrichr

Top ranked 10 compounds listed in “LINCS L1000 Chem Pert
up” category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments. known Alzheimer’s drug

known Alzheimer’s drug Top ranked 10 compounds listed in “DrugMatrix”
category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments.

known Alzheimer’s drug Top ranked 10 compounds listed in “Drug
Perturbations from GEO up” category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments.

37 Summary We can infer effective drugs to diseases from
gene expression profile using TD based unsupervised FE I have published a monograph from Springer. I am happy if you can but it, although it is extremely expensive.

Tensor decomposition based unsupervised feature...

Tensor decomposition based unsupervised feature extraction applied to drug discovery from gene expression analysis

More Decks by Y-h. Taguchi

Other Decks in Science

Featured

Transcript