Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Y-h. Taguchi
September 25, 2018
56

Drug candidate identifcation based on gene expression of treated cells using tensor decomposition-based unsupervised feature extraction for large-scale data

Oral presentation at InCob2018, Delhi, 25th Sep. 2018
Accepted in BMC Bioinformatics

Y-h. Taguchi

September 25, 2018
Tweet

More Decks by Y-h. Taguchi

Transcript

  1. Drug candidate identifcation based on gene
    expression of treated cells using tensor
    decomposition-based unsupervised feature
    extraction for large-scale data
    Y-h. Taguchi
    Department of Physics, Chuo University,
    Tokyo, Japan
    InCob2018, New Delhi, 25th Sep. 2018
    Accepted in BMC Bioinformatics

    View full-size slide

  2. Drug discovery (DD) = Dose dependence
    Dose
    (density)
    Efect
    No high throughput (HT) methods are available
    ←→ gene expression = HT sequencing/mincroarry
    Is it possoble HT DD from HT gene expression
    methods?

    View full-size slide

  3. Data are available (LINCS)
    Multiple cancer cell lines are treated with
    various drugs with multiple dose density
    → Problem: How can we screen these?
    Regression analysis between gene expression
    and dose density?
    → Too small observations (a few dose density)
    might prevent us from obtaining signifcant P-
    values after correcting P-values with
    considering multiple comparisons.
    → How about unsupervised methods?

    View full-size slide

  4. N features
    Categorical
    multiclasses
    PCA
    PC1
    samples
    PC Loadings
    M samples
    N × M Matrix X (numerical values)
    PC2
    PC1
    PC Score
    features
    +
    + +
    + +
    +
    +
    +
    +
    +
    +
    + +
    +
    +
    No distinction
    Between classes
    PCA, but embedding features instead of samples into lower dim.

    View full-size slide

  5. Synthetic example
    10 samples
    10 samples
    90 features 10 features
    N(0,1/2)
    N(μ,1/2)
    [N(m,1/2)+N(0,1/2)]/2
    +:Top 10 outliers
    m=2
    Thus, extracting outliers
    selects features distinct
    between two classes in
    an unsupervised way.
    Accuracy:(100 trials)
    Accuracy:(100 trials)
    89.5% (m=2)
    52.6% (m=1)
    PC1
    PC2
    Normal μ:mean
    Distribution ½ :SD

    View full-size slide

  6. By extending matrix to tensor, x
    ijl
    ,we can deal with
    data of “dose density(i) ⨉ compounds(j) ⨉ gene(l)”
    → Tensors can be decomposed.
    x
    ijl
    G
    x
    k1i
    x
    k2j
    x
    k3l
    x
    ijl
    ≒Σ
    k1,k2,k3
    G
    k1,k2,k3
    x
    k1i
    x
    k2j
    x
    k3l
    gene
    compounds
    dose density
    compounds
    dose density
    gene

    View full-size slide

  7. Dose density
    Genes
    Compounds
    Genes
    2nd Component
    k2£6
    Compounds
    Genes
    xijl
    Gk1
    ,k2
    ,k3
    x
    k3
    l
    x
    k1
    i
    x
    k2
    j
    Dose density
    Outlier
    compounds
    Outlier
    genes
    x
    k2
    j
    x
    k3
    l
    G2,k2
    ,k3
    x
    k3
    l
    x
    k2
    j
    Compounds
    k3£6

    View full-size slide

  8. A
    compounds
    Genes
    Single gene perturbation
    Gene A
    Gene B
    Gene C
    TD based
    unsupervised FE
    A
    B
    C
    B
    C

    View full-size slide

  9. Gene expression profles with drug compounds treatments
    Identifcation of pairs of genes and compounds with dose
    dependence by tensor decomposition
    Target proteins identifcation by the comparisons with single
    gene KO/KI experiments
    Validation by the comparison with known drug target
    proteins by Fisher’s exact test,
    Over all data analysis flow

    View full-size slide

  10. Results for 13 cancer cell lines (LINCS)
    Identification by tensor decomposition
    Target protein by the comparison with
    KO/KI experiments
    ( )

    View full-size slide

  11. Evaluations
    Comparisons with
    drug2gene.com
    and DsigDB
    ○:
    signifcant overlap
    by Fisher’s exact
    test
    (1)-(13):
    Cancer cell lines in
    the previous table

    View full-size slide

  12. Conclusions
    We have developed tensor decomposition based
    method that can identify genes associated with
    dose denpendent gene expression profles based
    upon drug compounds treated gene expression
    profles.
    Drug target proteins are further infered by the
    comparisons with single gene KO/KI
    expressions.
    The results are signifcantly overlaped with
    known drug taregt proteins.

    View full-size slide