Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor decomposition based unsupervised feature extraction applied to drug discovery from gene expression analysis

Tensor decomposition based unsupervised feature extraction applied to drug discovery from gene expression analysis

Presentation at the 5th IITM-Tokyo Tech symposium.
"Current trends in Bioinformatics: Big data analysis, machine learning and drug design", will be held on 6th - 7th March 2020 in IITM
https://web.iitm.ac.in/bioinfo2/symposium2020/home

Y-h. Taguchi

March 07, 2020
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. Tensor decomposition based unsupervised feature extraction applied to drug discovery

    from gene expression analysis Y-h. Taguchi Department of Physics, Chuo University, Tokyo, Japan This slide can be obtained from this QR code
  2. Three Reseach Projects Project 1: Inference drugs effective to cancers

    from gene expression profiles of drug treated cancer cell lines Project 2: Integrated analysis of gene expression profiles of drug treated tissues and human patients Project 3: Inference of drug efective to Alzheimer disease of mice brain single cell gene expression (without drug treated gene expression)
  3. Project 1: Inference drugs effective to cancers from gene expression

    profiles of drug treated cancer cell lines
  4. Drug candidate identification based on gene expression of treated cells

    using tensor decomposition-based unsupervised feature extraction for large-scale data Y-h. Taguchi BMC Bioinformatics volume 19, Article number: 388 (2019) OPEN ACCESS
  5. Drug discovery (DD) = Dose dependence Dose (density) Effect No

    high throughput (HT) methods are available ←→ gene expression = HT sequencing/mincroarry Is it possible HT DD from HT gene expression methods?
  6. By extending matrix to tensor, x ijl ,we can deal

    with data of “dose density(i) ⨉ compounds(j) ⨉ gene(l)” → Tensors can be decomposed. x ijl G u k1i u k2j u k3l x ijl ≒ΣΣ k1,k2,k3 G k1,k2,k3 u k1i u k2j u k3l gene compounds dose density compounds dose density gene
  7. Dose density Genes Compounds Genes 2nd Component k2£6 Compounds Genes

    xijl Gk1 ,k2 ,k3 u k3 l u k1 i u k2 j Dose density Outlier compounds Outlier genes u k2 j u k3 l G2,k2 ,k3 u k3 l u k2 j Compounds k3£6
  8. A compounds Genes Single gene perturbation Gene A Gene B

    Gene C TD based unsupervised FE A B C B C
  9. Gene expression profiles with drug compounds treatments Identification of pairs

    of genes and compounds with dose dependence by tensor decomposition Target proteins identification by the comparisons with single gene KO/KI experiments Validation by the comparison with known drug target proteins by Fisher’s exact test, Over all data analysis flow
  10. Results for 13 cancer cell lines (LINCS) Identification by tensor

    decomposition Target protein by the comparison with KO/KI experiments ( )
  11. Evaluations Comparisons with drug2gene.com and DsigDB ◦: significant overlap by

    Fisher’s exact test (1)-(13): Cancer cell lines in the previous table
  12. Project 2: Integrated analysis of gene expression profiles of drug

    treated tissues and human patients
  13. Identification of candidate drugs using tensor-decomposition- based unsupervised feature extraction

    in integrated analysis of gene expression between diseases and DrugMatrix datasets Y.-h. Taguchi Scientific Reports volume 7, Article number: 13733 (2017) OPEN ACCESS
  14. × x ij x il x ij ×x il x

    ijl Tensor decomposition G u k1i u k2j u k3l x ijl =x ij ×x il ≒ΣΣ k1,k2,k3 G k1,k2,k3 u k1i u k2j u k3l i:genes j:Patients vs healthy contol l:dose density Patients vs healthy contol Dose density
  15. x j 3 i x j 1 j 2 j

    3 i =x j 1 j 2 i x j 3 i =∑G(l 1 ,l 2 ,l 3 ,l 4 )u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 i x j 1 j 2 i u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 i j 1 j 2 j 3 i Compounds Time dulation after treatment Patients vs Health control gene Gene X Target protein
  16. days Dulation after treatment: The 1st to 4th singlar value

    vectors
  17. Heart disease The 1st to 3rd singular value vectors

  18. Compounds:the 2nd singular value vectors

  19. Top ranked 10 vectors with larger absolute values l 1

    =2
  20. Feature extraction Feature extraction No real data separated well Assume

    Gaussian Detect outliers P i =P[ >∑ k ( x ik σ ) 2 ] Benjamini-Hochberg corrected P <0.01 P-values by χ2 dist P(p) 1-p 0 1
  21. 274 genes are selected Akt1 A2m Abcb10 Acads Accn3 Acox3

    Acsl1 Acta1 Actg2 Actn1 Actr1b Acvr1c Adcy3 Adora3 Adra1b Adrb2 Agpat1 Agrn Ak3 Akap1 Alas1 Amhr2 Anxa2 Aoc3 Apob Apod Aqp4 Areg Arf4 Atf3 Atp1b1 Atp5a1 Atp6v1h Azgp1 B4galt7 Bag3 Bmpr1a Bpgm Btbd9 Btg2 Bves Bzw1 C1qa C3 Ca3 Canx Cast Ccl2 Ccnd2 Ccnl1 Ccr1 Cd36 Cd63 Cd74 Cdh23 Cebpg Ces1 Chchd4 Chdh Ciapin1 Cmklr1 Col5a1 Cryab Csda Csnk2b Csrp3 Ctf1 Cyb5b Cybb Dcps Ddit4l Dhrs1 Dlc1 Dnajc5 Dpp4 Dusp11 Ednra Eef2k Egfr Egr1 Eif2s2 Eif4a1 Ephx1 F8 Fabp5 Fbl Fbxo22 Fgf9 Flt1 Fndc5 Fos Fstl1 Fut8 Fyttd1 Gapdh Gatm Gbe1 Ghr Git2 Glul Gna12 Gnb1 Gnb2l1 Gnb3 Gosr1 Got1 Got2 Gpx3 Grwd1 Gstp1 Gucy1a3 Hapln1 Hmbs Hmgb1 Hrc Hspa5 Htr4 Idh3a Idh3g Il1rl1 Il6r Il6st Immt Ing4 Itga6 Itm2c Itpr3 Junb Kcmf1 Kcnj8 Kcnk3 Kcnt1 Kpna1 Lactb2 Laptm4b Lcat Lcp1 Ldha Lphn3 Lrp1 Lss Ltbp4 Man2c1 Map2k4 Map4k3 Mapk10 Mapk14 Mapk6 Mapk9 Mfn2 Mgat3 Mgp Mknk2 Mlycd Mme Mpp3 mrpl9 Msn Mterf Mtus1 Mvd Mxd3 Myc Myl2 Ncoa2 Ndfip1 Ndufs2 Nedd4l Nes Nexn Nf1 Nfatc4 Nfe2l2 Npr3 Nr0b2 Nr3c1 Nr3c2 Nsf Obscn Odz2 P2rx3 Pacsin2 Pccb Pdcl3 Pde4b Pdia4 Pdk2 Pdk4 Pdrg1 Pdxk Pggt1b Pi4k2a Pold4 Ppara Ppif Ppm1a Ppm1b Ppp1r14a Ppp1r14c Ppp2ca Ppp2r2d Prelp Psmb4 Psmc1 Psmd12 Ptgds Ptger2 Pvrl2 Pycr2 Rab15 Ramp2 Rbm10 Rela Rhoa Rplp1 Rps18 Rps20 Rps6 Rxrg Samm50 Sccpdh Schip1 Scn2b Sdhd Sephs2 Serpinh1 Sfrp4 Sharpin Sirt5 Slc25a4 Slc2a4 Slc38a2 Slc40a1 Slc5a1 Slc6a1 Sln Slpi Smad4 Smpd1 Sod1 Sox18 Spin2b Spp1 Stat3 Steap3 Stip1 Stx7 Suclg1 Synj1 Tarbp2 Tfam Tmem30a Tnfaip6 Tnfrsf12a Tnfrsf1a Tnni3 Tpm1 Tpsab1 Trpc4ap Ttn Txndc12 Txnip Uchl1 Uqcrc2 Usp14 Vdac2 Vezt Vim Vsnl1 Vtn Wbp4 Yme1l1 Ywhae Ywhah → Based upon gene KO experiments, 556(up)/449(down) genes are selected
  22. Amitriptyline Atropine Baclofen Bezafibrate Caffeine Calcitriol Chlorambucil Cimetidine Citalopram Clemastine

    Clonazepam Cyclophosphamide D-Tubocurarine Chloride Dexamethasone Dexchlorpheniramine Digitoxin Diphenhydramine Doxazosin Ebastine Fenofibrate Fluphenazine Gabapentin Ifosfamide Iproniazid Lacidipine Loratadine Nevirapine Nimodipine Nitrendipine Ofloxacin Oxymetazoline Paroxetine Phenacemide Phenytoin Rosiglitazone Sparteine Stavudine Valsartan Vecuronium Bromide Venlafaxine Vinblastine Vincristine Zidovudine 43 compounds
  23. None
  24. None
  25. Evaluation by SwissDock Chrosis HnF4a vs Bezafibrate K i = 0.13μMM

  26. CYPOR vs Bezafibrate K i = 79nM Evaluation by SwissDock Chrosis

  27. Yin et al, “Systematic review and meta-analysis: bezafibrate in patients

    with primary biliary cirrhosis”, Drug Des Devel Ther. 2015 ;9:5407-19. CONCLUSION: Combination therapy improved liver biochemistry and the prognosis of PBC, but did not improve clinical symptoms or incidence of death. Attention should be paid to adverse events when using bezafibrate.
  28. Project 3: Inference of drug efective to Alzheimer disease of

    mice brain single cell gene expression (without drug treated gene expression)
  29. Neurological Disorder Drug Discovery from Gene Expression with Tensor Decomposition

    Author(s): Y-h. Taguchi*, Turki Turki. Journal Name: Current Pharmaceutical Design Volume 25 , Issue 43 , 2019 OPEN ACCESS
  30. Data & experiments (mice) Two genotypes (APP_NL-F-G and C57Bl/6), Two

    tissues (Cortex and Hippocampus), Four ages (3, 6, 12, and 21 weeks), Two sexes (male and female) Four 96 well plates (the number of cells). Aim: Understanding Alzheimer’s disease
  31. Tensor x ij1j2j3j4j5j6 represents gene expression of ith gene of

    j 1 th cell (well) j 2 th genotype (j 2 = 1:APP_NL-F-G and j 2 = 2: C57Bl/6), j 3 th tissue (j 3 = 1:Cortex and j 3 = 2:Hippocampus), j 4 th age (j 4 = 1: three weeks,j 4 = 2: six weeks, j 4 = 3: twelve weeks, and j 4 = 4: twenty one weeks), j 5 th sex (j5 = 1:female and j5 = 2:male) j 6 th plate.
  32. x i j 1 j 2 j 3 j 4

    j 5 j 6 =∑ l 1 l 2 l 3 l 4 l 5 l 6 l 7 G(l 1 ,l 2 ,l 3 ,l 4 ,l 5 ,l 6 ,l 7 ) ×u l 1 j 1 u l 2 j 2 u l 3 j 3 u l 4 j 4 u l 5 j 5 u l 6 j 6 u l 7 i (A) u l1j1 :96 wells (cells), l 1 =1 (B) u l2j2 : genotype APP_NL-F-G vs C57Bl/6, l 2 =1 (C) u l3j3 : Cortex vs Hippocampus, l 3 =1 (D) u l4j4 : 3, 6, 12, 21 weeks , l 4 =2 (E) u l5j5 : female vs male, l 5 =1 (F) u l6j6 : 4 plates , l 1 =1 → l 7 =2 with G(1,1,1,2,1,1,l 7 ) (the largest absolute values)
  33. P i =P χ2 [( u 2i σ2 ) 2

    ] Attributing P-values to genes After correcting P-values by BH criterion, 401 genes with P i <0.01 are selected. → Evaluate how these are overlapped with genes affected by known Alzheimer’s drug treatments. 401 genes are uploaded to Enrichr
  34. Top ranked 10 compounds listed in “LINCS L1000 Chem Pert

    up” category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments. known Alzheimer’s drug
  35. known Alzheimer’s drug Top ranked 10 compounds listed in “DrugMatrix”

    category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments.
  36. known Alzheimer’s drug Top ranked 10 compounds listed in “Drug

    Perturbations from GEO up” category in Enrichr. Overlap is that between selected 401 genes and genes selected in individual experiments.
  37. 37 Summary We can infer effective drugs to diseases from

    gene expression profile using TD based unsupervised FE I have published a monograph from Springer. I am happy if you can but it, although it is extremely expensive.