Upgrade to Pro — share decks privately, control downloads, hide ads and more …

生成モデルを中心としたAI創薬最前線 / Elix CBI 2019

Elix
October 22, 2019

生成モデルを中心としたAI創薬最前線 / Elix CBI 2019

AI創薬で利用される様々な生成モデルについてまとめています。CBI学会2019での講演スライドです。

Elix

October 22, 2019
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. ໨࣍ 2 • ΠϯτϩμΫγϣϯ • ཁૉٕज़ • Fingerprint, SMILESϕʔεͷϞσϧ •

    άϥϑϕʔεͷϞσϧ • ੜ੒Ϟσϧͷར༻๏ • ੜ੒ϞσϧͷੑೳධՁ • ࠓޙͷൃలͷํ޲ੑ • Elix Chem
  2. ಛʹΑ͘༻͍ΒΕΔදݱํ๏ 6 • Fingerprint • ༷ʑͳछྨ͕ଘࡏ͢Δ͕ECFPͳͲ͕ಛʹ༗໊ • ֤Ϗοτ͕ಛఆͷߏ଄ʹରԠ • Collision͕ى͖ͯ͠·͏Մೳੑ͕͋Δ

    • InvertibleͰͳ͍ • SMILES • Խ߹෺Λจࣈྻͱͯ͠දݱ • ҰͭͷԽ߹෺ʹରͯ͠Ұҙʹܾ·Βͳ͍ • Θ͔ͣʹҟͳΔԽ߹෺΋SMILESͱͯ͠͸େ͖͘มΘͬͯ͠·͏৔߹΋ ʢԽ߹෺ͷsimilarityΛදݱ͢ΔΑ͏ʹσβΠϯ͞Ε͍ͯͳ͍ʣ • Graph • Խ߹෺ΛϊʔυΛΤοδͱͯ͠දݱ • ࣗવͳදݱํ๏ʹࢥ͑Δ https://arxiv.org/abs/1802.04364 https://arxiv.org/abs/1903.04388
  3. 3FTUSJDUFE˜&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 14 ੜ੒ϞσϧͷҰछ Generator (G): ِ෺ͷը૾Λੜ੒͠ɺDΛὃͦ͏ͱ͢Δ Discriminator

    (D): ຊ෺ͷը૾ͱِ෺ͷը૾Λݟ෼͚Α͏ͱ͢Δ Noise G D ຊ෺ or ِ෺ʁ ِ෺ͷը૾ ʢੜ੒ը૾ʣ ຊ෺ͷը૾ ʢTraining setʣ Karras et al. (2017)
  4. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  29 • ೖग़ྗ • Binary fingerprints

    (MACCS) • Log concentration (LCONC) • தؒ૚ • 5ͭͷχϡʔϩϯͰߏ੒ • 1ͭ͸Growth Inhibition percentage (GI) • ࢒Γ4ͭ͸ਖ਼ن෼෍ʹۙͮ͘Α͏ʹֶश The cornucopia of meaningful leads: Applying deep AAEs for new molecule development in oncology
  5. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  30 σʔληοτ Λ༻ҙֶ͠श Ϟσϧ͔Β αϯϓϧ நग़

    ࣅͨಛ௃ͷ Խ߹෺Λ୳ࡧ • NCI-60, MCF-7 • 6252ͷԽ߹෺ • Fingerprint, LCONC, GI͔Β੒Δσʔλ •640ݸͷϕΫτϧ ʢԾ૝తͳԽ߹ ෺ʣΛαϯϓϧ •LCONC < -5.0 M ͷ΋ͷΛநग़ •32ݸͷϕΫτϧΛಘΔ •ࣅͨಛ௃ͷԽ߹෺Λ PubChem͔Β୳͠ ग़͢ ࣮ݧͷྲྀΕ
  6. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  31 • PubChemɿ7200ສͷԽ߹෺ • ੜ੒ͨ͠32ݸͷϕΫτϧͱࣅͨಛ௃Λ࣋ͭԽ߹෺ ΛPubChem͔Βநग़

    • ࠷ऴతʹ69ݸͷԽ߹෺Λಘͨ • طʹ߅͕Μࡎͱͯ͠஌ΒΕ͍ͯΔ΋ͷ͕ෳ਺ • 13ݸ͸ಛڐ͕औΒΕ͍ͯΔ΋ͷ • ΄ͱΜͲ͸ΞϯτϥαΠΫϦϯܥ ʢݱࡏ࠷΋ޮՌతͳ߅͕Μࡎʣ ྘: PubChem ੨: ֶशσʔλ ੺: ੜ੒ϕΫτϧʢԾ૝తͳԽ߹෺ʣ ࣮ݧ݁Ռ
  7. 3FTUSJDUFE˜&MJY *OD 4FHMFSFUBM  33 • LSTMʹΑΓԽ߹෺Λੜ੒ • ೖग़ྗ͸SMILES •

    ԼهΛ܁Γฦ͢ʢHillclimb-MLEͱ΋ݺ͹ΕΔʣ 1. LSTMͰֶशɾαϯϓϧ 2. Target filtering modelͰϑΟϧλϦϯά ʢػցֶशҎ֎΋Մʣ Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
  8. 3FTUSJDUFE˜&MJY *OD (PNF[#PNCBSFMMJFUBM  $7"& 34 • RNN+VAEʹΑΓԽ߹෺Λੜ੒ • ೖग़ྗ͸SMILES

    • λʔήοτͱ͢Δಛੑ͕େ͖͍latent code Λݟ͚ͭΔ Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
  9. 3FTUSJDUFE˜&MJY *OD 1PQPWBFUBM  3F-FB4& 37 https://arxiv.org/abs/1711.10907 Popova et al.

    (2017) • SMILESϕʔεͷੜ੒Ϟσϧ • ໨ඪಛੑΛ࠷దԽ͢ΔͨΊʹڧԽֶशͱ૊Έ߹Θͤ ͍ͯΔ • ௨ৗ͸rewardΛRDKit౳Ͱܭࢉ͢Δ͜ͱ͕ଟ͍͕ɺ SMILESϕʔεͷ༧ଌϞσϧʹΑΓrewardΛܭࢉͯ͠ ͍Δ • ͜ΕʹΑΓRDKit౳Ͱ͸ܭࢉͰ͖ͳ͍ಛੑ΋࠷దԽ
  10. 3FTUSJDUFE˜&MJY *OD "MM4.*-&47"& 39 • άϥϑܥϞσϧ • 3ʙ7૚͘Β͍ͷ΋ͷ͕ଟ͍ • 1૚ʹ͖ͭ1ͭ෼ͷڑ཭ʹ͋Δ৘ใ͕఻೻

    • ZINC250kʹؚ·ΕΔ෼ࢠ • ฏۉ௚ܘ͕11.1 • ࠷େ௚ܘ24 • ෼ࢠશମʹ৘ใΛ఻͖͑Δ͜ͱ͕Ͱ͖ͳ͍ • RNNͰ͸௕͍৘ใΛ௥͑Δ • SMILES͸Ұҙʹܾ·Βͳ͍ • ෳ਺ͷSMILESΛೖྗʹར༻ Alperstein et al. (2019)
  11. 3FTUSJDUFE˜&MJY *OD %F$BP,JQG  .PM("/ 42 • DiscriminatorͰgraph convΛར༻͢Δ͜ͱʹΑΓorder invariantʹ

    • ֤ಛੑΛ࠷దԽ͢Δ͜ͱ͸͏·͍͍ͬͯ͘ΔΑ͏ʹݟ͑Δ • ͔͠͠ɺuniqueness͕2%ఔ౓ͱඇৗʹ௿͍ʢGoal-directedͳ৔߹ʣ • GAN΍RLͰ͸ग़ྗΛଟ༷ʹ͢ΔΑ͏ͳ੍໿͕ͳ͍ͨΊ • ҰൃͰάϥϑΛੜ੒͢ΔͨΊܭࢉ͕࣌ؒ୹͍ • QM9Ͱ࣮ݧɻߋʹେ͖ͳάϥϑʹద༻͢Δͷ͸೉ͦ͠͏ άϥϑΛҰൃͰੜ੒͢ΔλΠϓͷϞσϧɻGANͱڧԽֶश΋ར༻ɻ
  12. 3FTUSJDUFE˜&MJY *OD 1ÖMTUFSM8BDIJOHFS  -'.PM("/ 43 • MolGANͷΑ͏ʹάϥϑΛҰൃͰੜ੒͢ΔλΠϓɻ͜ͷϞσϧͰ͸valencyʹؔ͢Δ੍໿Λಋೖ • Reconstruction

    lossΛexplicitʹܭࢉ͢Δ͜ͱ͕ͳ͘ɺgraph isomorphism problemΛճආ • ී௨ͷGANͱҧͬͯencoder΋ؚΉߏ଄ʹͳ͍ͬͯͯɺlatent spaceͰsimilarity͕ߴ͍෼ࢠΛ୳͢͜ͱ͕༰қ • QM9Ͱ࣮ݧ
  13. 3FTUSJDUFE˜&MJY *OD -JFUBM  45 Learning Deep Generative Models of

    Graphs SMILESͰ͸ͳ͘άϥϑͱͯ͠ϊʔυͱΤοδΛॱʹ௥Ճ GrammarVAEͳͲΑΓ΋ྑ͍݁Ռ
  14. 3FTUSJDUFE˜&MJY *OD +JOFUBM  +57"& 46 Junction Tree Variational Autoencoder

    for Molecular Graph Generation • ୯७ʹ͸ϊʔυΛҰͭͻͱͭ௥Ճ͍ͯ͘͠Ξϓϩʔν͕ߟ͑ ΒΕΔ • ͔͠͠ɺ͜Εͩͱ࣮ࡍʹ͸ଘࡏ͠ͳ͍Խ߹෺͕ੜ੒͞Εͯ͠ ·͏Մೳੑ͕͋Δ • ͦ͜ͰΫϥελ͝ͱʹੜ੒͍ͯ͘͠
  15. 3FTUSJDUFE˜&MJY *OD +JOFUBM  +57"& 47 ࣄલʹఆ͓͍ٛͯͨ͠ΫϥελΛ࢖ͬ ͯπϦʔߏ଄ʹ෼ղ EmbeddingΛ΋ͱʹ৽ͨͳπϦʔߏ଄Λߏங ʢϊʔυΛҰͭͻͱͭ௥Ճ͍ͯ͘͠ํࣜʣ

    Neural message passing ʹΑΓΤϯίʔυ ಘΒΕͨgraph embeddingͱπϦʔߏ଄ͷ ྆ํΛ࢖ͬͯ࠷ऴతͳԽ߹෺Λੜ੒ ʢΫϥελΛͲ͏૊Έ߹ΘͤΔ͔ͱ͍͏ࣗ༝ ౓͕͋ΔͨΊ͜ͷεςοϓ͕ඞཁʣ GRUʹΑΓΤϯίʔυ
  16. 3FTUSJDUFE˜&MJY *OD :PVFUBM  ($1/ 48 Graph Convolutional Policy Network

    for Goal-Directed Molecular Graph Generation ΤοδΛҰͭͣͭ௥Ճ͢Δ͜ͱͰάϥϑΛੜ੒ GANͱڧԽֶशΛ૊Έ߹ΘͤͨϞσϧ
  17. 3FTUSJDUFE˜&MJY *OD ("/ͱ7"&ͷൺֱ 50 GAN • ϝϦοτ • ͏·͘νϡʔχϯάͰ͖Δͱྑ͍݁Ռ •

    Reconstruction lossΛܭࢉ͠ͳͯ͘ྑ͍ʢgraph isomorphism problemΛճආʣ • σϝϦοτ • ϋΠύʔύϥϝʔλνϡʔχϯά͕ࠔ೉ • Mode-collapseʢಉ͡΋ͷ͹͔Γੜ੒ͯ͠͠·͏ʣ VAE • ϝϦοτ • GANΑΓ΋҆ఆͯ͠ಈ͘ • ϋΠύʔύϥϝʔλνϡʔχϯάָ͕ • Mode-collapse΋ى͖ʹ͍͘ • σϝϦοτ • Reconstruction lossΛܭࢉ͢ΔͨΊgraph isomorphism problem͕ग़ͯ͘Δ
  18. 3FTUSJDUFE˜&MJY *OD 'JOHFSQSJOU 4.*-&4 (SBQIͷൺֱ 51 • Fingerprintϕʔε • Fingerprint͸invertibleͰͳ͍ͨΊ࢖͍ͮΒ͍

    ʢͦͷͨΊ΄ͱΜͲݟ͔͚ͳ͍ʣ • SMILESϕʔε • ҆ఆͨ͠ੑೳ • Validity͕௿͘ͳͬͯ͠·͏܏޲ • Fragment-base generation͕೉͍͠ • Graphϕʔεʢone-shotܕʣ • ߴ଎ • Heavy atom͕9ҎԼͷখ͞ͳ෼ࢠ͔͠࡞Ε͍ͯͳ͍ • Validity΍uniqueness͕௿͍ • Graphϕʔεʢrecurrentܕʣ • Validity͕ߴ͍ • ϊʔυͱΤοδͷorderingͷ໰୊
  19. "SPVT1PVTFUBM   &YQMPSJOHUIF(%#DIFNJDBMTQBDFVTJOHEFFQHFOFSBUJWFNPEFMT 55 • GDB-13: 13ݸ·Ͱͷheavy atomͰߏ੒͞ΕΔ9.75ԯ෼ࢠ͔ΒͳΔ σʔληοτ

    • ͦͷ͏ͪͷ0.1%ʹ૬౰͢Δ100ສ෼ࢠΛ࢖ֶͬͯश • SMILESΛGRUʹ༩͑ΔγϯϓϧͳϞσϧ • 20ԯ෼ࢠΛαϯϓϧ͢Δ͜ͱʹΑΓGDB-13ͷ68.9%Λ෮ݩ͢Δ͜ ͱ͕Ͱ͖ͨ • GDB-13ʹؚ·ΕΔԽ߹෺ͷಛ௃΋͔ͭΉ͜ͱ͕Ͱ͖ͨ • SMILESͷه๏ʹىҼͯ͠ੜ੒ͮ͠Β͍λΠϓͷ෼ࢠ͕͋Δ͜ͱ ΋෼͔ͬͨʢringΛଟؚ͘Ή΋ͷͳͲʣ
  20. .PMFDVMBSPQUJNJ[BUJPO 57 Latent space಺Λ୳ࡧ • Gradient ascent • ϕΠζ࠷దԽ ڧԽֶश

    Hillclimb-MLE ʢϑΟϧλϦϯάΛ܁Γฦֶͯ͠शʣ Conditioning code ʢ৚݅΋ೖྗͱͯ͠ѻ͏ʣ
  21. ͦͷଞʢ༩͑ͨ෼ࢠͱྨࣅ౓ͷߴ͍෼ࢠΛੜ੒ʣ 59 Drug Analogs from Fragment Based Long Short-Term Memory

    Generative Neural Networks 1. ChEMBL, DrugBank, FDB17౳ͷσʔλΛ࢖ͬͯLSTMΛ pre-train 2. ͦͷޙ1ͭͷ෼ࢠͰfine-tuningʢ10छྨͷ෼ࢠͰ࣮ݧʣ 3. SMILESΛੜ੒ • Retain correct SMILES • Remove duplicates • Remove undesirable functional groups 4. ྨࣅ౓ͷߴ͍෼ࢠΛબͿ ༩͑ͨ෼ࢠͱྨࣅ౓ͷߴ͍෼ࢠΛੜ੒ Awale et el. (2018)
  22. #SPXOFUBM  (VBDB.PM %JTUSJCVUJPO-FBSOJOHϕϯνϚʔΫ 64 • Distribution-learningϕϯνϚʔΫͷ໨త • ܇࿅σʔλͷ܏޲Λ൓өͯ͠෼෍Λ͏·͘࠶ݱͰ͖͍ͯΔ͔ΛධՁ •

    ͜ͷλεΫ͕͏·͘͜ͳͤΔΑ͏ʹͳΔͱɺԽ߹෺ͷಛ௃Λ͏·͘ͱΒ͑ΒΕΔΑ͏ʹͳ͍ͬͯΔ͸ͣͰɺgoal-directed taskʹ΋໾ཱͭͱߟ͑ΒΕΔ • Validity • ੜ੒͞ΕͨԽ߹෺ͷ͏ͪͲΕ͘Β͍ͷׂ߹͕༗ޮͰ͋Δ͔ • ༗ޮ͔Ͳ͏͔͸RDKitͰνΣοΫ • Uniqueness • ॏෳΛνΣοΫɻϢχʔΫͳԽ߹෺ͷׂ߹ • Novelty • ৽نੑɻ܇࿅σʔλʹଘࡏ͠ͳ͍Խ߹෺ͷׂ߹ • Frechet ChemNet Distance (FCD) • ੜ෺׆ੑ༧ଌͰֶशͨ͠ChemNetͷಛ௃Λ࢖͍ɺ܇࿅σʔλͷ෼෍ͱͲΕ͘Β͍͍͔ۙΛൺֱ͢Δࢦඪ • ը૾Ͱ͸ੜ੒ϞσϧͷੑೳΛൺֱ͢ΔͨΊʹFrechet Inception Distance (FID)ͱ͍͏ࢦඪ͕࢖ΘΕΔ͕FCD͸ͦͷԽ߹෺൛ • KL Divergence • 2ͭͷ֬཰෼෍ͷࠩΛଌΔͨΊͷࢦඪ • ෺ཧԽֶతಛ௃Λॏࢹ
  23. (PBM%JSFDUFEϕϯνϚʔΫʢNPMFDVMBSPQUJNJ[BUJPOʣ 65 • Goal-DirectedϕϯνϚʔΫͷ໨త • ಛఆͷείΞΛ࠷େԽ͢Δͱ͍͏ઃఆͰධՁ • Similarity • ྨࣅੑɻ܇࿅σʔλ͔ΒऔΓআ͔ΕͨλʔήοτʹͲΕ͘Β͍͚ۙͮΒΕΔ͔

    • Rediscovery • ্هͱࣅ͍ͯΔ͕similarityͰ͸ͳ͘ɺશ͘ಉ͡෼ࢠΛੜ੒Ͱ͖Δ͔ • ͪ͜Β͸׬શҰகΛඞཁͱ͢Δ • Isomers • ྫ͑͹C7H8N2O2ͷΑ͏ͳ෼ࢠʹରͯ͠ͲΕ͘Β͍ҟੑମΛੜ੒Ͱ͖Δ͔ • ૑ༀͱ͸௚઀తʹ͸ؔ܎ͳ͍͕ϞσϧͷॊೈੑΛධՁ • Median molecules • ෳ਺ͷ෼ࢠͱͷsimilarityΛಉ࣌ʹ࠷େԽ
  24. .FBTVSJOH$PNQPVOE2VBMJUZ 66 • Measuring Compound Qualityͷ໨త • ઌߦݚڀͷde novo design

    algorithmʹΑͬͯੜ੒͞ΕͨԽ߹෺͸ෆ҆ఆɺ൓Ԡੑ͕ߴ͍ɺ߹੒͕ࠔ೉ɺmedicinal chemist͕ݟΔ ͱ͓͔͍͠౳ͷ໰୊͕͋ΔՄೳੑ͕͋Δ • ͦͷͨΊɺ·ͱ΋ͳԽ߹෺Ͱ͋Δ͔ΛνΣοΫ͢Δඞཁ͕͋Δ • Medicinal chemist͕࣋ͭ஌ݟΛ͢΂ͯϧʔϧԽͯ͠νΣοΫ͢Δ͜ͱ͸೉͍͠ • ͜͜Ͱ͸rd_filterΛద༻ • https://github.com/PatWalters/rd_filters
  25. ࣮ݧ݁Ռɿ(PBMEJSFDUFEϕϯνϚʔΫ 68 • Best of Data Set • ܇࿅σʔλͷத͔Β࠷΋είΞͷߴ͍Խ߹෺ΛબΜͩ৔߹ɻ ࠷௿ݶ௒͑ͳ͚Ε͹ͳΒͳ͍ࢦඪɻ

    • Graph GA • Ұ൪ྑ͍݁Ռ • SMILES LSTM • Graph GAͱ΄΅ಉ౳ͷྑ͍݁Ռ • ͦͷଞϞσϧ • Graph GAͱSMILES LSTMʹൺ΂Δͱ໌Β͔ʹѱ͍݁Ռ
  26. ࣮ݧ݁Ռɿ$PNQPVOE2VBMJUZ.FBTVSFNFOU 69 • Goal-directedͳλεΫʹ͓͍ͯੜ੒͞ΕͨԽ߹෺Λrd_filterͰΫΦ ϦςΟʔνΣοΫ • SMILES LSTM͕໌Β͔ʹྑ͍݁Ռ • SMILES

    LSTMͰ͸·ͣpre-training͕͋ΓɺͦΕ͔Β֤είΞͷ࠷ େԽΛߦ͏ͱ͍͏ྲྀΕʹͳ͍ͬͯΔɻPre-trainingͷϑΣʔζͰԽ߹ ෺ͱͯ͠ॏཁͳಛ௃Λ͏·ֶ͘शͰ͖ͨͷͩͱߟ͑ΒΕΔɻ • ҰํɺGraph GA͸͋·Γྑ͘ͳ͍݁Ռɻࣄલ஌ࣝΛ࣋ͭ͜ͱͳ͘ ͍͖ͳΓείΞΛ࠷େԽ͠Α͏ͱ͢Δ෦෼ʹ໰୊͕͋Γͦ͏ɻ • Goal-directedϕϯνϚʔΫͰ͸SMILES LSTMͱGraph GA͸ಉ౳ ͷ݁ՌͩͬͨͷͰɺSMILES LSTMΛ࢖ͬͨํ͕ྑ͍ɻ
  27. 3FTUSJDUFE˜&MJY *OD 1ÖMTUFSM8BDIJOHFS  -'.PM("/ 70 • Validity, uniqueness, novelty͕ྑ͘࢖ΘΕΔ͕͋·ΓΑ͍ϝτϦΫεͰ͸ͳ͍

    • ϊʔυͱΤοδΛϥϯμϜʹબͿϞσϧʢvalency͸ߟྀʣ͕ྑ͘ݟ͑ͯ͠·͏ • ֶशσʔλͱࣅ͍ͯͯԽֶతʹҙຯͷ͋Δ෼ࢠ͕ੜ੒͞Ε͍ͯΔ͔͸ߟྀ͞Εͯ ͍ͳ͍
  28. 3FTUSJDUFE˜&MJY *OD .VMUJPCKFDUJWFPQUJNJ[BUJPO (VJNBSBFTFUBM  03("/ 72 • Druglikeness, synthesizability,

    solubilityͰަޓʹֶश͢Δ͜ͱʹΑΓ3ͭͷಛੑΛ࠷దԽ • 3ͭ࠷దԽͯ͠΋ͦΕͧΕ1͚ͭͩΛ࠷దԽͨ࣌͠ʹ͍ۙ݁Ռ