Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2019-02-27_network_rewiring

 2019-02-27_network_rewiring

Studying the dynamic rewiring of molecular networks - through Natural Language Processing, data integration and graph analysis

Presentation given by Sofie Van Landeghem at the Women in Tech event in Belgium, February 2019

Sofie Van Landeghem

February 27, 2019
Tweet

More Decks by Sofie Van Landeghem

Other Decks in Science

Transcript

  1. Studying the dynamic rewiring of molecular networks (through natural language

    processing, data integration and graph analysis) Sofie Van Landeghem Freelancer ML and NLP @ OxyKodit
  2. Sofie Van Landeghem http://www.oxykodit.com 2 Overview Text mining Machine learning

    Data mining Graph algorithms How do living organisms respond to changing conditions?
  3. Sofie Van Landeghem http://www.oxykodit.com 4 Focus on pathways “... that

    a ... transcription factor, ARR1, activates the gene SHY2 ...” (PMID 19039136) “SHY2 also regulates the cytokinin biosynthesis enzyme IPT5 ... ” (PMC 2688277) “... disruption of IPT1, IPT3, IPT5 and IPT7 resulted in significant reductions in cytokinin (CK) levels ...” (PMC 3280229) “Cytokinins: metabolism and function in plant adaptation to environmental stresses” (PMID:22236698)
  4. Sofie Van Landeghem http://www.oxykodit.com 6 Extracting facts from text “Constitutive

    overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes”
  5. Sofie Van Landeghem http://www.oxykodit.com 7 Gene recognition “Constitutive overexpression of

    KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Gene recognition “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” ➔ Challenge: Lexical variants & synonyms: KRP2, KRP-2, KIP-related protein 2, ICK2 ➔ Solution: Dictionaries and/or Machine Learning (ML) to recognize lexical clues such as uppercasing, context words and abbreviations specifically mentioned in the text
  6. Sofie Van Landeghem http://www.oxykodit.com 8 Event extraction “Constitutive overexpression of

    KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Event extraction ➔ Challenge: Wide variety to express the same interaction in English: different words & different grammar. E.g. “binding of X with Y” vs. “X and Y interact” ➔ Solution: Supervised ML to learn lexical, syntactic and grammatical patterns from annotated examples, generalize them, and use them for prediction on unseen text
  7. Sofie Van Landeghem http://www.oxykodit.com 9 From text to graph “Constitutive

    overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Structured representation Graph representation
  8. Sofie Van Landeghem http://www.oxykodit.com 10 Pathway reconstruction Van Landeghem, Björne,

    et al. 2013. PLoS One part of the human p53 signaling pathway p53 is a tumor suppressor gene that is mutated in a large proportion of human cancers
  9. Sofie Van Landeghem http://www.oxykodit.com 11 Mining clinical trials ... There

    were 26 complete responses (16%) and 0 partial responses (0%) ... ... The median progression-free survival time was 65 months ... Measure Value CR 16% PR 0% PFS 65 mo. Unstructured trial texts Structured data One-hundred sixty-seven patients were treated: 91 in arm A, 48 in arm B, and 28 in arm C. The RR was 10% in arm A and zero in arms B and C. Patients RR Responders Arm A 91 10% 9 Arm B 48 0% 0 Arm C 28 0% 0 All 167 5.4% 9
  10. Sofie Van Landeghem http://www.oxykodit.com 12 NLP completes the picture Interactions

    from structured databases Augmented with text mining information 75% of all protein-protein interactions extracted from text, were factually correct. Only 35% of them could be found in structured PPI databases. There is a need for (curated) NLP results to obtain a more complete picture ! Van Landeghem, et al. 2013. The Plant Cell
  11. Sofie Van Landeghem http://www.oxykodit.com 13 Part II : data mining

    “It was much nicer before people started storing all their data in the cloud.”
  12. Sofie Van Landeghem http://www.oxykodit.com 14 Plant osmotic stress Arabidopsis plants

    exposed to 25mM mannitol ➢ This induces osmotic stress similar to drought 24 samples in total ➢ Time measurements: after 1.5h, 3h, 12h, 24h ➢ 3 biological replicates + control expirements Skirycz, et al. 2011. Plant Cell At each time interval, it is analyzed which genes are expressed differently with respect to the normal plants which were not exposed to mannitol (drought)
  13. Sofie Van Landeghem http://www.oxykodit.com 15 Differential network @ 1.5h Showing

    the changes in regulation after 1.5 hour of drought stress These changes represent the plant’s coping mechanism with respect to its changing environment!
  14. Sofie Van Landeghem http://www.oxykodit.com 16 Summary of all time points

    Compare the reference to all 4 time-specific networks Show rewiring that happens at least at 3 out of 4 time-points
  15. Sofie Van Landeghem http://www.oxykodit.com 17 Experimental results Hy5 mutants Normal

    Normal conditions Drought Osmotic use-case: 3 genes were found to be interesting: • PIL5, HY5, TCH3 • Phenotypic analyses on hy5 mutants under mannitol treatment • Confirmed that HY5 is involved in mannitol-responsive networks in growing Arabidopsis leaves Van Landeghem, et al. 2016. BMC Bioinformatics
  16. Sofie Van Landeghem http://www.oxykodit.com 18 Conclusions Text mining Machine learning

    Data mining Graph algorithms Personalized medicine More succesful agriculture
  17. Sofie Van Landeghem http://www.oxykodit.com 19 Acknowledgments Gent University Marieke Dubois

    Stefanie De Bodt Thomas Van Parys Zuzanna Drebert Yves Van de Peer Dirk Inzé Turku University Filip Ginter Jari Björne Tapio Salakoski J&J Johannes Hermann Francisco Talamas Henry Lin Large-scale text mining resource for PubMed: http://evexdb.org Cytoscape app for differential network analyses: http://apps.cytoscape.org/apps/diffany
  18. Sofie Van Landeghem http://www.oxykodit.com 20 Questions ? There are no

    stupid questions – so let’s agree there are no stupid answers either.