Graph convolutional policy network for goal directed molecular graph generation

Slide 1

Slide 1 text

1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018) Kazuki Fujikawa, DeNA

Slide 8

Slide 8 text

- • mg orO y snvc 2 ,- 2cy bdq Ol – GB G 8 8 8 A A G E I A 8BAG A B E E AG G BA B B 8 +M B 7 E • 2 ,- 2c GB A8B E [ b R] b [N c • rf [ c – 01+ + E • 1 8B E] b2 ,- 2 y c+ 1-[ • 2 D+ z N 8E A GBE V id c ] • ptOu meh I E G Sa id z ] b esign Using a Data-Driven Continuous ecules ifer N. Wei,‡,# David Duvenaud,¶,# José Miguel Hernández-Lobato,§,# nnis Sheberla,‡ Jorge Aguilera-Iparraguirre,† Timothy D. Hirzel,† ru-Guzik*,‡,⊥ e Square, Suite 800, Boston, Massachusetts 02109, United States iology, Harvard University, Cambridge, Massachusetts 02138, United States sity of Toronto, 6 King’s College Road, Toronto, Ontario M5S 3H5, Canada Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K. United States ey, United States m, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, to convert discrete m a multidimensional ows us to generate new optimization through ounds. A deep neural thousands of existing coupled functions: an e encoder converts the e into a real-valued verts these continuous ntations. The predictor tent continuous vector ous representations of molecules allow us to automatically generate novel chemical Research Article Cite This: ACS Cent. Sci. 2018, 4, 268−276 Gomez-Bombarelli+, 2018 to generate drug-like molecules. [Gómez-Bombarelli et al., 2016b] employed a variational autoencoder to build a latent, continuous space where property optimization can be made through surrogate optimization. Finally, [Kadurin et al., 2017] presented a GAN model for drug generation. Ad- ditionally, the approach presented in this paper has recently been applied to molecular design [Sanchez-Lengeling et al., 2017]. In the field of music generation, [Lee et al., 2017] built a SeqGAN model employing an efficient representation of multi-channel MIDI to generate polyphonic music. [Chen et al., 2017] presented Fusion GAN, a dual-learning GAN model that can fuse two data distributions. [Jaques et al., 2017] employ deep Q-learning with a cross-entropy reward to optimize the quality of melodies generated from an RNN. In adversarial training, [Pfau and Vinyals, 2016] recontex- tualizes GANs in the actor-critic setting. This connection is also explored with the Wasserstein-1 distance in WGANs [Arjovsky et al., 2017]. Minibatch discrimination and feature mapping were used to promote diversity in GANs [Salimans et al., 2016]. Another approach to avoid mode collapse was shown with Unrolled GANs [Metz et al., 2016]. Issues and convergence of GANs has been studied in [Mescheder et al., 2017]. 3 Background In this section, we elaborate on the GAN and RL setting based on SeqGAN [Yu et al., 2017] G✓ is a generator parametrized by ✓, that is trained to pro- duce high-quality sequences Y1:T = (y1, ..., yT ) of length T and a discriminator model D parametrized by , trained to classify real and generated sequences. G✓ is trained to deceive D , and D to classify correctly. Both models are trained in alternation, following a minimax game: is completed. In order to do so, we perform N-time Monte Carlo search with the canonical rollout policy G✓ represented as MCG✓ (Y1:t; N) = {Y 1 1:T , ..., Y N 1:T } (3) where Y n 1:t = Y1:t and Y n t+1:T is stochastically sampled via the policy G✓ . Now Q(s, a) becomes Q(Y1:t 1, yt) = 8 > < > : 1 N P n=1..N R(Y n 1:T ), with Y n 1:T 2 MCG✓ (Y1:t; N), if t < T. R(Y1:T ), if t = T. (4) An unbiased estimation of the gradient of J(✓) can be de- rived as r✓J(✓) ' 1 T X t=1,...,T E yt ⇠G✓(yt |Y1:t 1) [ r✓ log G✓(yt |Y1:t 1) · Q(Y1:t 1, yt)] (5) Finally in SeqGAN the reward function is provided by D . 4 ORGAN Figure 1: Schema for ORGAN. Left: D is trained as a classifier receiving as input a mix of real data and generated data by G. Right: Guimaraes+, 2017

Slide 9

Slide 9 text

- • eo m f nhpc eo c dl g – + 1 8 1 8 9 1 +8 0 • i vc E • r eo M 1 9A 8 T scu C c i vc ] – A 2 8 1 81 8 19 A 2 , 9 2A91 1 1 8 8 0 • V ct ep l [ ] L aC eo c G 2 8 8 • c T J • 1 9A 8 T sM eo Learning Deep Generative Models of Graphs Add edge? (yes/no) Add edge? (yes/no) 2 Add node (2)? (yes/no) 2 Pick node (0) to add edge (0,2) Add node (0)? (yes/no) Add edge? (yes/no) Add edge? (yes/no) 1 1 0 0 Add node (1)? (yes/no) 1 Pick node (0) to add edge (0,1) 0 1 0 1 0 1 0 1 0 2 Add edge? (yes/no) 1 0 2 0 0 Generation steps Figure 1. Depiction of the steps taken during the generation process. Junction Tree Variational Autoencoder for Molecular Graph Generation Figure 2. Comparison of two graph generation schemes: Structure by structure approach is preferred as it avoids invalid intermediate states (marked in red) encountered in node by node approach. ond phase, the subgraphs (nodes in the tree) are assembled together into a coherent molecular graph. We evaluate our model on multiple tasks ranging from molecular generation to optimization of a given molecule according to desired properties. As baselines, we utilize state-of-the-art SMILES-based generation approaches (Kus- ner et al., 2017; Dai et al., 2018). We demonstrate that our model produces 100% valid molecules when sampled from a prior distribution, outperforming the top perform- ing baseline by a signiﬁcant margin. In addition, we show that our model excels in discovering molecules with desired properties, yielding a 30% relative gain over the baselines. 2. Junction Tree Variational Autoencoder Our approach extends the variational autoencoder (Kingma Figure 3. Overview of our method: A molecular graph G is ﬁrst decomposed into its junction tree TG , where each colored node in the tree represents a substructure in the molecule. We then encode both the tree and graph into their latent embeddings z and z . Li+, 2018 Jin+, 2018

Slide 14

Slide 14 text

• – w G NgA ecG N Q d e sD PNvt L s pilpa sE • – / : • mAm NFSEF !(π$ , &' ) • 1 r , nA oA PQ – – • p + 4- t L !(π$ , &' ) 3 Proposed Method In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. 3.1 Problem Definition We represent a graph G as (A, E, F), where A 2 {0, 1}n⇥n is the adjacency matrix, and F 2 Rn⇥d is the node feature matrix assuming each node has d features. We define E 2 {0, 1}b⇥n⇥n to be the (discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei,j,k = 1 if there exists an edge of type i between nodes j and k, and A = Pb i=1 Ei . Our primary objective is to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize E G0 [S(G 0)], where G 0 is the generated graph, and S could be one or multiple domain-specific statistics of interest. It is also of practical importance to constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of example graphs G ⇠ pdata(G), and would like to incorporate such prior knowledge by regularizing the property optimization objective with E G,G0 [J(G, G 0)] under distance metric J(·, ·). In the case of molecule generation, the set of hard constraints is described by chemical valency while the distance metric is an adversarially trained discriminator. adversarial rewards. A small positive reward is assigned if the action does not violate valency rules, otherwise a small negative reward is assigned. As an example, the second row of Figure 1 shows the scenario that a termination action is taken. When the environment updates according to a terminating action, both a step reward and a final reward are given, and the generation process terminates. To ensure that the generated molecules resemble a given set of molecules, we employ the Generative Adversarial Network (GAN) framework [10] to define the adversarial rewards V (⇡✓, D ) min ✓ max V (⇡✓, D ) = E x⇠pdata [log D (x)] + E x⇠⇡✓ [log D (1 x)] (1) where ⇡✓ is the policy network, D is the discriminator network, x represents an input graph, pdata is the underlying data distribution which defined either over final graphs (for final rewards) or intermediate graphs (for intermediate rewards). However, only D can be trained with stochastic gradient descent, as x is a graph object that is non-differentiable with respect to parameters . Instead, we use V (⇡✓, D ) as an additional reward together with other rewards, and optimize the total rewards with policy gradient methods [42] (Section 3.5). The discriminator network employs the same structure of the policy network (Section 3.4) to calculate the node embeddings, which are then aggregated into a graph embedding and cast into a scalar prediction. 3.4 Graph Convolutional Policy Network Having illustrated the graph generation environment, we outline the architecture of GCPN, a policy network learned by the RL agent to act in the environment. GCPN takes the intermediate graph Gt and the collection of scaffold subgraphs C as inputs, and outputs the action at , which predicts a new link to be added, as described in Section 3.3.

Slide 15

Slide 15 text

• ,: 05 , 5 1 + 0 7 ,,+ 1 5 07 I – ] ( • !"#(θ) = ( )* log π/ (0* |2* ) 3 4* – ) 7 2: 0 2 , 5 1 2:0 7 ), ( S • !5"6 θ = ( )* 78(9:|;:) 78<=> (9:|;:) 3 4* = ( )* ?* (θ) 3 4* – ,: 05 , 5 1 + 0 7 ,,+ ( cPa [ O C • !5@6"(θ) = ( )* min(?* D 3 4* , clip(?* θ , 1 − ε, 1 + ε) 3 4* ) 3 Proposed Method In this section we formulate the problem of graph generation as learning an RL agent that iteratively adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe the problem definition, the environment design, and the Graph Convolutional Policy Network that predicts a distribution of actions which are used to update the graph being generated. 3.1 Problem Definition We represent a graph G as (A, E, F), where A 2 {0, 1}n⇥n is the adjacency matrix, and F 2 Rn⇥d is the node feature matrix assuming each node has d features. We define E 2 {0, 1}b⇥n⇥n to be the (discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei,j,k = 1 if there exists an edge of type i between nodes j and k, and A = Pb i=1 Ei . Our primary objective is to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize E G0 [S(G 0)], where G 0 is the generated graph, and S could be one or multiple domain-specific statistics of interest. It is also of practical importance to constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of example graphs G ⇠ pdata(G), and would like to incorporate such prior knowledge by regularizing the property optimization objective with E G,G0 [J(G, G 0)] under distance metric J(·, ·). In the case of molecule generation, the set of hard constraints is described by chemical valency while the distance metric is an adversarially trained discriminator.

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text