Slide 1

Slide 1 text

Studying the dynamic rewiring of molecular networks (through natural language processing, data integration and graph analysis) Sofie Van Landeghem Freelancer ML and NLP @ OxyKodit

Slide 2

Slide 2 text

Sofie Van Landeghem http://www.oxykodit.com 2 Overview Text mining Machine learning Data mining Graph algorithms How do living organisms respond to changing conditions?

Slide 3

Slide 3 text

Sofie Van Landeghem http://www.oxykodit.com 3 Progressing science

Slide 4

Slide 4 text

Sofie Van Landeghem http://www.oxykodit.com 4 Focus on pathways “... that a ... transcription factor, ARR1, activates the gene SHY2 ...” (PMID 19039136) “SHY2 also regulates the cytokinin biosynthesis enzyme IPT5 ... ” (PMC 2688277) “... disruption of IPT1, IPT3, IPT5 and IPT7 resulted in significant reductions in cytokinin (CK) levels ...” (PMC 3280229) “Cytokinins: metabolism and function in plant adaptation to environmental stresses” (PMID:22236698)

Slide 5

Slide 5 text

Sofie Van Landeghem http://www.oxykodit.com 5 Part I : text mining

Slide 6

Slide 6 text

Sofie Van Landeghem http://www.oxykodit.com 6 Extracting facts from text “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes”

Slide 7

Slide 7 text

Sofie Van Landeghem http://www.oxykodit.com 7 Gene recognition “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Gene recognition “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” ➔ Challenge: Lexical variants & synonyms: KRP2, KRP-2, KIP-related protein 2, ICK2 ➔ Solution: Dictionaries and/or Machine Learning (ML) to recognize lexical clues such as uppercasing, context words and abbreviations specifically mentioned in the text

Slide 8

Slide 8 text

Sofie Van Landeghem http://www.oxykodit.com 8 Event extraction “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Event extraction ➔ Challenge: Wide variety to express the same interaction in English: different words & different grammar. E.g. “binding of X with Y” vs. “X and Y interact” ➔ Solution: Supervised ML to learn lexical, syntactic and grammatical patterns from annotated examples, generalize them, and use them for prediction on unseen text

Slide 9

Slide 9 text

Sofie Van Landeghem http://www.oxykodit.com 9 From text to graph “Constitutive overexpression of KRP2 slightly above its endogenous level inhibited the mitotic cell cycle-specific CDKA;1 kinase complexes” Structured representation Graph representation

Slide 10

Slide 10 text

Sofie Van Landeghem http://www.oxykodit.com 10 Pathway reconstruction Van Landeghem, Björne, et al. 2013. PLoS One part of the human p53 signaling pathway p53 is a tumor suppressor gene that is mutated in a large proportion of human cancers

Slide 11

Slide 11 text

Sofie Van Landeghem http://www.oxykodit.com 11 Mining clinical trials ... There were 26 complete responses (16%) and 0 partial responses (0%) ... ... The median progression-free survival time was 65 months ... Measure Value CR 16% PR 0% PFS 65 mo. Unstructured trial texts Structured data One-hundred sixty-seven patients were treated: 91 in arm A, 48 in arm B, and 28 in arm C. The RR was 10% in arm A and zero in arms B and C. Patients RR Responders Arm A 91 10% 9 Arm B 48 0% 0 Arm C 28 0% 0 All 167 5.4% 9

Slide 12

Slide 12 text

Sofie Van Landeghem http://www.oxykodit.com 12 NLP completes the picture Interactions from structured databases Augmented with text mining information 75% of all protein-protein interactions extracted from text, were factually correct. Only 35% of them could be found in structured PPI databases. There is a need for (curated) NLP results to obtain a more complete picture ! Van Landeghem, et al. 2013. The Plant Cell

Slide 13

Slide 13 text

Sofie Van Landeghem http://www.oxykodit.com 13 Part II : data mining “It was much nicer before people started storing all their data in the cloud.”

Slide 14

Slide 14 text

Sofie Van Landeghem http://www.oxykodit.com 14 Plant osmotic stress Arabidopsis plants exposed to 25mM mannitol ➢ This induces osmotic stress similar to drought 24 samples in total ➢ Time measurements: after 1.5h, 3h, 12h, 24h ➢ 3 biological replicates + control expirements Skirycz, et al. 2011. Plant Cell At each time interval, it is analyzed which genes are expressed differently with respect to the normal plants which were not exposed to mannitol (drought)

Slide 15

Slide 15 text

Sofie Van Landeghem http://www.oxykodit.com 15 Differential network @ 1.5h Showing the changes in regulation after 1.5 hour of drought stress These changes represent the plant’s coping mechanism with respect to its changing environment!

Slide 16

Slide 16 text

Sofie Van Landeghem http://www.oxykodit.com 16 Summary of all time points Compare the reference to all 4 time-specific networks Show rewiring that happens at least at 3 out of 4 time-points

Slide 17

Slide 17 text

Sofie Van Landeghem http://www.oxykodit.com 17 Experimental results Hy5 mutants Normal Normal conditions Drought Osmotic use-case: 3 genes were found to be interesting: • PIL5, HY5, TCH3 • Phenotypic analyses on hy5 mutants under mannitol treatment • Confirmed that HY5 is involved in mannitol-responsive networks in growing Arabidopsis leaves Van Landeghem, et al. 2016. BMC Bioinformatics

Slide 18

Slide 18 text

Sofie Van Landeghem http://www.oxykodit.com 18 Conclusions Text mining Machine learning Data mining Graph algorithms Personalized medicine More succesful agriculture

Slide 19

Slide 19 text

Sofie Van Landeghem http://www.oxykodit.com 19 Acknowledgments Gent University Marieke Dubois Stefanie De Bodt Thomas Van Parys Zuzanna Drebert Yves Van de Peer Dirk Inzé Turku University Filip Ginter Jari Björne Tapio Salakoski J&J Johannes Hermann Francisco Talamas Henry Lin Large-scale text mining resource for PubMed: http://evexdb.org Cytoscape app for differential network analyses: http://apps.cytoscape.org/apps/diffany

Slide 20

Slide 20 text

Sofie Van Landeghem http://www.oxykodit.com 20 Questions ? There are no stupid questions – so let’s agree there are no stupid answers either.