Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Y-h. Taguchi
September 25, 2018
52

Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Oral presentation at InCob2018, Delhi, 25th Sep. 2018
Accepted in BMC Bioinformatics

Y-h. Taguchi

September 25, 2018
Tweet

More Decks by Y-h. Taguchi

Transcript

  1. Drug candidate identifcation based on gene expression of treated cells

    using tensor decomposition-based unsupervised feature extraction for large-scale data Y-h. Taguchi Department of Physics, Chuo University, Tokyo, Japan InCob2018, New Delhi, 25th Sep. 2018 Accepted in BMC Bioinformatics
  2. Drug discovery (DD) = Dose dependence Dose (density) Efect No

    high throughput (HT) methods are available ←→ gene expression = HT sequencing/mincroarry Is it possoble HT DD from HT gene expression methods?
  3. Data are available (LINCS) Multiple cancer cell lines are treated

    with various drugs with multiple dose density → Problem: How can we screen these? Regression analysis between gene expression and dose density? → Too small observations (a few dose density) might prevent us from obtaining signifcant P- values after correcting P-values with considering multiple comparisons. → How about unsupervised methods?
  4. N features Categorical multiclasses PCA PC1 samples PC Loadings M

    samples N × M Matrix X (numerical values) PC2 PC1 PC Score features + + + + + + + + + + + + + + + No distinction Between classes PCA, but embedding features instead of samples into lower dim.
  5. Synthetic example 10 samples 10 samples 90 features 10 features

    N(0,1/2) N(μ,1/2) [N(m,1/2)+N(0,1/2)]/2 +:Top 10 outliers m=2 Thus, extracting outliers selects features distinct between two classes in an unsupervised way. Accuracy:(100 trials) Accuracy:(100 trials) 89.5% (m=2) 52.6% (m=1) PC1 PC2 Normal μ:mean Distribution ½ :SD
  6. By extending matrix to tensor, x ijl ,we can deal

    with data of “dose density(i) ⨉ compounds(j) ⨉ gene(l)” → Tensors can be decomposed. x ijl G x k1i x k2j x k3l x ijl ≒Σ k1,k2,k3 G k1,k2,k3 x k1i x k2j x k3l gene compounds dose density compounds dose density gene
  7. Dose density Genes Compounds Genes 2nd Component k2£6 Compounds Genes

    xijl Gk1 ,k2 ,k3 x k3 l x k1 i x k2 j Dose density Outlier compounds Outlier genes x k2 j x k3 l G2,k2 ,k3 x k3 l x k2 j Compounds k3£6
  8. A compounds Genes Single gene perturbation Gene A Gene B

    Gene C TD based unsupervised FE A B C B C
  9. Gene expression profles with drug compounds treatments Identifcation of pairs

    of genes and compounds with dose dependence by tensor decomposition Target proteins identifcation by the comparisons with single gene KO/KI experiments Validation by the comparison with known drug target proteins by Fisher’s exact test, Over all data analysis flow
  10. Results for 13 cancer cell lines (LINCS) Identification by tensor

    decomposition Target protein by the comparison with KO/KI experiments ( )
  11. Evaluations Comparisons with drug2gene.com and DsigDB ◦: signifcant overlap by

    Fisher’s exact test (1)-(13): Cancer cell lines in the previous table
  12. Conclusions We have developed tensor decomposition based method that can

    identify genes associated with dose denpendent gene expression profles based upon drug compounds treated gene expression profles. Drug target proteins are further infered by the comparisons with single gene KO/KI expressions. The results are signifcantly overlaped with known drug taregt proteins.