Using Attribution-based Explainability to Guide Deep Molecular Optimization, Elix, CBI 2021

Using Attribution-based Explainability to Guide Deep Molecular Optimization Pierre Wüthrich
Research Engineer Elix, Inc. 27.10.2021

2 Background • SOTA models for de novo molecular design
are based on meta-heuristic methods ◦ Evolutionary strategies such as Genetic Algorithms ▪ Genetic Expert Guided Learning (GEGL) • Guiding Deep Molecular Optimization with Genetic Exploration, Sunsoo et al., NeurIPS 2020 ▪ Janus • Janus: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design, Nigam et al., arxiv.org/abs/2106.04011 • Why GAs? Conceptually simple and perform equal or better to deep learning models on molecular design tasks Figure from original GEGL paper. Guiding Deep Molecular Optimization with Genetic Exploration, Sunsoo et al., NeurIPS 2020

3 Motivation • In this work, we speciﬁcally focus on
the GEGL framework ◦ Leverages an expert-designed Graph-based Genetic expert ◦ It is not a priori clear which expert will show best performance • Hand-crafted Genetic experts have several drawbacks ◦ Might contain unwanted implicit biases ◦ Laborious and diﬃcult to develop ➔ Our contribution: ◆ We propose a novel expert-free Genetic Expert (InFrag) ◆ We extend the GEGL framework to allow multiple GAs Figure from original GEGL paper. Guiding Deep Molecular Optimization with Genetic Exploration, Sunsoo et al., NeurIPS 2020

4 Methodology: Overall enhanced GEGL Framework (eGEGL)

5 Methodology: InFrag Genetic Expert (1) • Which part of
the molecule is responsible for the obtained property? ◦ Black-box fitness function cannot be directly leveraged to answer the question • Instead, we pretrain a Graph Convolutional Network (GCN) on an initial dataset to act as a pseudo-scoring function by trying to predict the true score • Class-activation Map (CAM) is used to generate atom-wise attributions ◦ Simple attribution-based interpretability method ◦ Dot product between weights of final layer and hidden representations before the global pooling layer

6 Methodology: InFrag Genetic Expert (2) • Once the atom-wise
attributions computed, we use them to divide the molecule in fragments: ◦ Cuts are only allowed along single bonds • Each fragment is scored: ◦ If the sum > 0 for all attributions in the fragment, it is considered relevant wrt to the predicted score and kept for further processing ◦ Vice-versa, the fragment is discarded. • Fragments are kept in a Fragment library ◦ Limited in size ◦ Selection pressure via true molecular fitness score

7 Methodology: InFrag Genetic Expert (3) • Fragment recombination: ◦
Random sampling of fragments from the Fragment library ◦ Translate to SELFIES => Random recombination => Translate back to SMILES ▪ Ensures validity ▪ Atom count is not necessarily maintained • We make no assumptions about potential relationships between fragments

8 Results: plogp Task • We first ensured that the
GCN is able to function as a pseudo scoring function • The GCN is indeed able to learn which molecules truly have a higher score • Therefore the attributions generated by the CAM method are reliable in generating the atom-wise attributions

9 Results: plogp Task • On the basic plogp optimization
task, the proposed Genetic Expert does not show decreased performance • We also compared to another SELFIES-based expert which interpolates between two molecules

10 Results: plogp Task • When limiting the number of
optimization rounds, our proposed GE outperforms the other baselines • The GCN is able to extract useful knowledge from the initial training dataset to quickly propose high-rewarding molecules ◦ Even outperforms previous methods in a single optimization round (e.g. ChemTS, GCPN, JT-VAE) • Promising for use cases where obtaining additional samples is difficult or costly

11 Results: Guacamol Goal-directed Benchmark • Also performs competitively on
the difficult Guacamol goal-directed benchmarks • Our results suggest that the choice of the genetic expert within the GEGL framework is not relevant • We can substitute it with more efficient/simpler ones

12 Conclusion and Future Work ❖ Conclusion ➢ Our expert-free
genetic expert is able to substitute the expert-designed GB-GA within GEGL ➢ Our expert is able to explore the chemical space more efficiently than the tested baselines when the optimization-rounds are strongly restricted ❖ Future Work ➢ Try different XAI methods (e.g. Integrated gradients) ➢ Add synthesizability constraints to the framework => Ensure that generated molecules are actually of interest ➢ Check the influence of the size of the initial training dataset for the GCN pseudo scoring model ▪ How much data is necessary to maintain the observed performance gain?

13 About this research ❖ Please refer to the preprint
available on ChemRxiv for more information and results (e.g. multi-expert setting): ➢ InFrag: Using Attribution-based Explainability to Guide Deep Molecular Optimization 10.33774/chemrxiv-2021-qtq8d

14 Thank you very much for your attention!

株式会社Elix http://ja.elix-inc.com/ 15

Using Attribution-based Explainability to Guide...

Using Attribution-based Explainability to Guide Deep Molecular Optimization, Elix, CBI 2021

Elix

More Decks by Elix

Other Decks in Technology

Featured

Transcript

Using Attribution-based Explainability to Guide Deep Molecular Optimization Pierre Wüthrich

2 Background • SOTA models for de novo molecular design

3 Motivation • In this work, we speciﬁcally focus on

4 Methodology: Overall enhanced GEGL Framework (eGEGL)

5 Methodology: InFrag Genetic Expert (1) • Which part of

6 Methodology: InFrag Genetic Expert (2) • Once the atom-wise

7 Methodology: InFrag Genetic Expert (3) • Fragment recombination: ◦

8 Results: plogp Task • We first ensured that the

9 Results: plogp Task • On the basic plogp optimization

10 Results: plogp Task • When limiting the number of

11 Results: Guacamol Goal-directed Benchmark • Also performs competitively on

12 Conclusion and Future Work ❖ Conclusion ➢ Our expert-free

13 About this research ❖ Please refer to the preprint

14 Thank you very much for your attention!

株式会社Elix http://ja.elix-inc.com/ 15