Upgrade to Pro — share decks privately, control downloads, hide ads and more …

生成モデルを中心としたAI創薬最前線 / Elix CBI 2019

F675784cbcef4d50b7f348c27cdb8031?s=47 Elix
October 22, 2019

生成モデルを中心としたAI創薬最前線 / Elix CBI 2019

AI創薬で利用される様々な生成モデルについてまとめています。CBI学会2019での講演スライドです。

F675784cbcef4d50b7f348c27cdb8031?s=128

Elix

October 22, 2019
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. ੜ੒ϞσϧΛத৺ͱͨ͠"*૑ༀ࠷લઢ גࣜձࣾ&MJY $&0݁৓৳࠸ 2019/10/22 1 $#*ֶձ೥େձ

  2. ໨࣍ 2 • ΠϯτϩμΫγϣϯ • ཁૉٕज़ • Fingerprint, SMILESϕʔεͷϞσϧ •

    άϥϑϕʔεͷϞσϧ • ੜ੒Ϟσϧͷར༻๏ • ੜ੒ϞσϧͷੑೳධՁ • ࠓޙͷൃలͷํ޲ੑ • Elix Chem
  3. ΠϯτϩμΫγϣϯ 3

  4. 3FTUSJDUFE˜&MJY *OD ෼ࢠઃܭ 4 Sanchez-Lengeling et al. (2018) ࣮ݧ/γϛϡϨʔγϣϯ ༧ଌϞσϧ

    ੜ੒Ϟσϧ Drug-likeͳ෼ࢠ͸ʙ10^60ݸ
  5. 3FTUSJDUFE˜&MJY *OD Α͘༻͍ΒΕΔදݱํ๏ 5 Fingerprint SMILES Graph Meter & Coote

    (2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
  6. ಛʹΑ͘༻͍ΒΕΔදݱํ๏ 6 • Fingerprint • ༷ʑͳछྨ͕ଘࡏ͢Δ͕ECFPͳͲ͕ಛʹ༗໊ • ֤Ϗοτ͕ಛఆͷߏ଄ʹରԠ • Collision͕ى͖ͯ͠·͏Մೳੑ͕͋Δ

    • InvertibleͰͳ͍ • SMILES • Խ߹෺Λจࣈྻͱͯ͠දݱ • ҰͭͷԽ߹෺ʹରͯ͠Ұҙʹܾ·Βͳ͍ • Θ͔ͣʹҟͳΔԽ߹෺΋SMILESͱͯ͠͸େ͖͘มΘͬͯ͠·͏৔߹΋ ʢԽ߹෺ͷsimilarityΛදݱ͢ΔΑ͏ʹσβΠϯ͞Ε͍ͯͳ͍ʣ • Graph • Խ߹෺ΛϊʔυΛΤοδͱͯ͠දݱ • ࣗવͳදݱํ๏ʹࢥ͑Δ https://arxiv.org/abs/1802.04364 https://arxiv.org/abs/1903.04388
  7. ༷ʑͳ༧ଌϞσϧ 7 Wu et al. (2017) άϥϑϕʔεͷϞσϧͷํ͕ྑ͍݁Ռ͕Ͱ͋Δ͜ͱ͕ଟ͍

  8. ੜ੒ϞσϧͷϕʔεͱͳΔΞʔΩςΫνϟ 8 Sanchez-Lengeling&Aspuru-Guzik (2018)

  9. ༷ʑͳ૊Έ߹Θͤ 9 Schwalbe-Koda & Gómez-Bombarelli (2019)

  10. 3FTUSJDUFE˜&MJY *OD ࠷৽ͷੜ੒ϞσϧҰཡ 10 Elton et al. (2019)

  11. Α͘࢖ΘΕΔެ։σʔληοτҰཡ 11 https://arxiv.org/abs/1903.04388 Elton et al. (2019)

  12. ཁૉٕज़ 12

  13. 3FTUSJDUFE˜&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 13 Karras et al. (2018)

  14. 3FTUSJDUFE˜&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 14 ੜ੒ϞσϧͷҰछ Generator (G): ِ෺ͷը૾Λੜ੒͠ɺDΛὃͦ͏ͱ͢Δ Discriminator

    (D): ຊ෺ͷը૾ͱِ෺ͷը૾Λݟ෼͚Α͏ͱ͢Δ Noise G D ຊ෺ or ِ෺ʁ ِ෺ͷը૾ ʢੜ੒ը૾ʣ ຊ෺ͷը૾ ʢTraining setʣ Karras et al. (2017)
  15. 3FTUSJDUFE˜&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 15

  16. 3FTUSJDUFE˜&MJY *OD "VUPFODPEFST 16

  17. 3FTUSJDUFE˜&MJY *OD "VUPFODPEFST 17

  18. 3FTUSJDUFE˜&MJY *OD 7BSJBUJPOBM"VUPFODPEFST 7"&T 18 reconstruction ਖ਼ن෼෍͔ΒͷͣΕ

  19. 3FTUSJDUFE˜&MJY *OD 3FDVSSFOU/FVSBM/FUXPSLT 3//T 19 Segler et al. (2017)

  20. 3FTUSJDUFE˜&MJY *OD (SBQI3FQSFTFOUBUJPOT 20 Peter et al. (2018) https://www.businessinsider.com/explainer-what-exactly-is-the-social-graph-2012-3

  21. 3FTUSJDUFE˜&MJY *OD (SBQI/FVSBM/FUXPSLT 21 Peter et al. (2018)

  22. 3FTUSJDUFE˜&MJY *OD (SBQI/FVSBM/FUXPSLT 22 Peter et al. (2018)

  23. 3FTUSJDUFE˜&MJY *OD (SBQI$POWPMVUJPOBM/FUXPSLT 23 2D Convolution Graph Convolution Graph Convolutional

    Networks Wu et al. (2019)
  24. 3FTUSJDUFE˜&MJY *OD 3FJOGPSDFNFOU-FBSOJOH 3- ڧԽֶश 24 Sutton & Barto (2018)

    Mnih et al. (2015)
  25. 3FTUSJDUFE˜&MJY *OD 3FJOGPSDFNFOU-FBSOJOH 3- ڧԽֶश 25 Sutton & Barto (2018)

    Mnih et al. (2015) ex) QED, logP
  26. 3FTUSJDUFE˜&MJY *OD 5SBOTGFS-FBSOJOHʢసҠֶशʣ 26 ඇৗʹେ͖ͳϥϕϧͳ͠σʔλ গྔͷڭࢣσʔλ RDKitͰlogPͳͲΛ஋Λܭࢉ͠ɺ pre-train Goh et

    al. (2017)
  27. 'JOHFSQSJOU 4.*-&4ϕʔεͷϞσϧ 27

  28. 3FTUSJDUFE˜&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 28 Fingerprint SMILES Graph Meter & Coote

    (2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
  29. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  29 • ೖग़ྗ • Binary fingerprints

    (MACCS) • Log concentration (LCONC) • தؒ૚ • 5ͭͷχϡʔϩϯͰߏ੒ • 1ͭ͸Growth Inhibition percentage (GI) • ࢒Γ4ͭ͸ਖ਼ن෼෍ʹۙͮ͘Α͏ʹֶश The cornucopia of meaningful leads: Applying deep AAEs for new molecule development in oncology
  30. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  30 σʔληοτ Λ༻ҙֶ͠श Ϟσϧ͔Β αϯϓϧ நग़

    ࣅͨಛ௃ͷ Խ߹෺Λ୳ࡧ • NCI-60, MCF-7 • 6252ͷԽ߹෺ • Fingerprint, LCONC, GI͔Β੒Δσʔλ •640ݸͷϕΫτϧ ʢԾ૝తͳԽ߹ ෺ʣΛαϯϓϧ •LCONC < -5.0 M ͷ΋ͷΛநग़ •32ݸͷϕΫτϧΛಘΔ •ࣅͨಛ௃ͷԽ߹෺Λ PubChem͔Β୳͠ ग़͢ ࣮ݧͷྲྀΕ
  31. 3FTUSJDUFE˜&MJY *OD ,BEVSJOFUBM  31 • PubChemɿ7200ສͷԽ߹෺ • ੜ੒ͨ͠32ݸͷϕΫτϧͱࣅͨಛ௃Λ࣋ͭԽ߹෺ ΛPubChem͔Βநग़

    • ࠷ऴతʹ69ݸͷԽ߹෺Λಘͨ • طʹ߅͕Μࡎͱͯ͠஌ΒΕ͍ͯΔ΋ͷ͕ෳ਺ • 13ݸ͸ಛڐ͕औΒΕ͍ͯΔ΋ͷ • ΄ͱΜͲ͸ΞϯτϥαΠΫϦϯܥ ʢݱࡏ࠷΋ޮՌతͳ߅͕Μࡎʣ ྘: PubChem ੨: ֶशσʔλ ੺: ੜ੒ϕΫτϧʢԾ૝తͳԽ߹෺ʣ ࣮ݧ݁Ռ
  32. 3FTUSJDUFE˜&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 32 Fingerprint SMILES Graph Meter & Coote

    (2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
  33. 3FTUSJDUFE˜&MJY *OD 4FHMFSFUBM  33 • LSTMʹΑΓԽ߹෺Λੜ੒ • ೖग़ྗ͸SMILES •

    ԼهΛ܁Γฦ͢ʢHillclimb-MLEͱ΋ݺ͹ΕΔʣ 1. LSTMͰֶशɾαϯϓϧ 2. Target filtering modelͰϑΟϧλϦϯά ʢػցֶशҎ֎΋Մʣ Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
  34. 3FTUSJDUFE˜&MJY *OD (PNF[#PNCBSFMMJFUBM  $7"& 34 • RNN+VAEʹΑΓԽ߹෺Λੜ੒ • ೖग़ྗ͸SMILES

    • λʔήοτͱ͢Δಛੑ͕େ͖͍latent code Λݟ͚ͭΔ Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
  35. 3FTUSJDUFE˜&MJY *OD ,VTOFSFUBM  (7"& 35 Grammar Variational Autoencoder Encoder

    Decoder จ๏ʢcontext free grammarʣΛߟྀͯ͠ੜ੒
  36. 3FTUSJDUFE˜&MJY *OD :BOHFUBM  $IFN54 36 MCTSͱRNNʹΑΓSMILESΛੜ੒ Penalized logPΛ࠷దԽ

  37. 3FTUSJDUFE˜&MJY *OD 1PQPWBFUBM  3F-FB4& 37 https://arxiv.org/abs/1711.10907 Popova et al.

    (2017) • SMILESϕʔεͷੜ੒Ϟσϧ • ໨ඪಛੑΛ࠷దԽ͢ΔͨΊʹڧԽֶशͱ૊Έ߹Θͤ ͍ͯΔ • ௨ৗ͸rewardΛRDKit౳Ͱܭࢉ͢Δ͜ͱ͕ଟ͍͕ɺ SMILESϕʔεͷ༧ଌϞσϧʹΑΓrewardΛܭࢉͯ͠ ͍Δ • ͜ΕʹΑΓRDKit౳Ͱ͸ܭࢉͰ͖ͳ͍ಛੑ΋࠷దԽ
  38. 3FTUSJDUFE˜&MJY *OD (VJNBSBFTFUBM  03("/ 38 • SeqGANͱ͍͏sequential data༻ͷRNNϕʔεͷGAN͕جʹͳ͍ͬͯΔ •

    DruglikenessͳͲͷಛੑΛ࠷దԽ͢ΔͨΊʹڧԽֶशΛಋೖ
  39. 3FTUSJDUFE˜&MJY *OD "MM4.*-&47"& 39 • άϥϑܥϞσϧ • 3ʙ7૚͘Β͍ͷ΋ͷ͕ଟ͍ • 1૚ʹ͖ͭ1ͭ෼ͷڑ཭ʹ͋Δ৘ใ͕఻೻

    • ZINC250kʹؚ·ΕΔ෼ࢠ • ฏۉ௚ܘ͕11.1 • ࠷େ௚ܘ24 • ෼ࢠશମʹ৘ใΛ఻͖͑Δ͜ͱ͕Ͱ͖ͳ͍ • RNNͰ͸௕͍৘ใΛ௥͑Δ • SMILES͸Ұҙʹܾ·Βͳ͍ • ෳ਺ͷSMILESΛೖྗʹར༻ Alperstein et al. (2019)
  40. άϥϑϕʔεͷϞσϧʢPOFTIPUܕʣ 40

  41. 3FTUSJDUFE˜&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 41 Fingerprint SMILES Graph Meter & Coote

    (2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
  42. 3FTUSJDUFE˜&MJY *OD %F$BP,JQG  .PM("/ 42 • DiscriminatorͰgraph convΛར༻͢Δ͜ͱʹΑΓorder invariantʹ

    • ֤ಛੑΛ࠷దԽ͢Δ͜ͱ͸͏·͍͍ͬͯ͘ΔΑ͏ʹݟ͑Δ • ͔͠͠ɺuniqueness͕2%ఔ౓ͱඇৗʹ௿͍ʢGoal-directedͳ৔߹ʣ • GAN΍RLͰ͸ग़ྗΛଟ༷ʹ͢ΔΑ͏ͳ੍໿͕ͳ͍ͨΊ • ҰൃͰάϥϑΛੜ੒͢ΔͨΊܭࢉ͕࣌ؒ୹͍ • QM9Ͱ࣮ݧɻߋʹେ͖ͳάϥϑʹద༻͢Δͷ͸೉ͦ͠͏ άϥϑΛҰൃͰੜ੒͢ΔλΠϓͷϞσϧɻGANͱڧԽֶश΋ར༻ɻ
  43. 3FTUSJDUFE˜&MJY *OD 1ÖMTUFSM8BDIJOHFS  -'.PM("/ 43 • MolGANͷΑ͏ʹάϥϑΛҰൃͰੜ੒͢ΔλΠϓɻ͜ͷϞσϧͰ͸valencyʹؔ͢Δ੍໿Λಋೖ • Reconstruction

    lossΛexplicitʹܭࢉ͢Δ͜ͱ͕ͳ͘ɺgraph isomorphism problemΛճආ • ී௨ͷGANͱҧͬͯencoder΋ؚΉߏ଄ʹͳ͍ͬͯͯɺlatent spaceͰsimilarity͕ߴ͍෼ࢠΛ୳͢͜ͱ͕༰қ • QM9Ͱ࣮ݧ
  44. άϥϑϕʔεͷϞσϧʢSFDVSSFOUܕʣ 44

  45. 3FTUSJDUFE˜&MJY *OD -JFUBM  45 Learning Deep Generative Models of

    Graphs SMILESͰ͸ͳ͘άϥϑͱͯ͠ϊʔυͱΤοδΛॱʹ௥Ճ GrammarVAEͳͲΑΓ΋ྑ͍݁Ռ
  46. 3FTUSJDUFE˜&MJY *OD +JOFUBM  +57"& 46 Junction Tree Variational Autoencoder

    for Molecular Graph Generation • ୯७ʹ͸ϊʔυΛҰͭͻͱͭ௥Ճ͍ͯ͘͠Ξϓϩʔν͕ߟ͑ ΒΕΔ • ͔͠͠ɺ͜Εͩͱ࣮ࡍʹ͸ଘࡏ͠ͳ͍Խ߹෺͕ੜ੒͞Εͯ͠ ·͏Մೳੑ͕͋Δ • ͦ͜ͰΫϥελ͝ͱʹੜ੒͍ͯ͘͠
  47. 3FTUSJDUFE˜&MJY *OD +JOFUBM  +57"& 47 ࣄલʹఆ͓͍ٛͯͨ͠ΫϥελΛ࢖ͬ ͯπϦʔߏ଄ʹ෼ղ EmbeddingΛ΋ͱʹ৽ͨͳπϦʔߏ଄Λߏங ʢϊʔυΛҰͭͻͱͭ௥Ճ͍ͯ͘͠ํࣜʣ

    Neural message passing ʹΑΓΤϯίʔυ ಘΒΕͨgraph embeddingͱπϦʔߏ଄ͷ ྆ํΛ࢖ͬͯ࠷ऴతͳԽ߹෺Λੜ੒ ʢΫϥελΛͲ͏૊Έ߹ΘͤΔ͔ͱ͍͏ࣗ༝ ౓͕͋ΔͨΊ͜ͷεςοϓ͕ඞཁʣ GRUʹΑΓΤϯίʔυ
  48. 3FTUSJDUFE˜&MJY *OD :PVFUBM  ($1/ 48 Graph Convolutional Policy Network

    for Goal-Directed Molecular Graph Generation ΤοδΛҰͭͣͭ௥Ճ͢Δ͜ͱͰάϥϑΛੜ੒ GANͱڧԽֶशΛ૊Έ߹ΘͤͨϞσϧ
  49. 3FTUSJDUFE˜&MJY *OD -JFUBM  .PM.1.PM3// 49 QED΍SAscore౳ͷ conditional codeΛೖΕΔ λʔήοτͱ͢ΔಛੑͳͲͰcondition͢ΔλΠϓͷϞσϧ

  50. 3FTUSJDUFE˜&MJY *OD ("/ͱ7"&ͷൺֱ 50 GAN • ϝϦοτ • ͏·͘νϡʔχϯάͰ͖Δͱྑ͍݁Ռ •

    Reconstruction lossΛܭࢉ͠ͳͯ͘ྑ͍ʢgraph isomorphism problemΛճආʣ • σϝϦοτ • ϋΠύʔύϥϝʔλνϡʔχϯά͕ࠔ೉ • Mode-collapseʢಉ͡΋ͷ͹͔Γੜ੒ͯ͠͠·͏ʣ VAE • ϝϦοτ • GANΑΓ΋҆ఆͯ͠ಈ͘ • ϋΠύʔύϥϝʔλνϡʔχϯάָ͕ • Mode-collapse΋ى͖ʹ͍͘ • σϝϦοτ • Reconstruction lossΛܭࢉ͢ΔͨΊgraph isomorphism problem͕ग़ͯ͘Δ
  51. 3FTUSJDUFE˜&MJY *OD 'JOHFSQSJOU 4.*-&4 (SBQIͷൺֱ 51 • Fingerprintϕʔε • Fingerprint͸invertibleͰͳ͍ͨΊ࢖͍ͮΒ͍

    ʢͦͷͨΊ΄ͱΜͲݟ͔͚ͳ͍ʣ • SMILESϕʔε • ҆ఆͨ͠ੑೳ • Validity͕௿͘ͳͬͯ͠·͏܏޲ • Fragment-base generation͕೉͍͠ • Graphϕʔεʢone-shotܕʣ • ߴ଎ • Heavy atom͕9ҎԼͷখ͞ͳ෼ࢠ͔͠࡞Ε͍ͯͳ͍ • Validity΍uniqueness͕௿͍ • Graphϕʔεʢrecurrentܕʣ • Validity͕ߴ͍ • ϊʔυͱΤοδͷorderingͷ໰୊
  52. ੜ੒Ϟσϧͷར༻๏ 52

  53. 3FTUSJDUFE˜&MJY *OD .PMFDVMFHFOFSBUJPO 53 Distribution Learning Predefined Scaffold Molecule Optimization

  54. %JTUSJCVUJPO-FBSOJOH 54 https://github.com/NVlabs/ffhq-dataset Karras et al. (2018) ֶशσʔλ ੜ੒͞Εͨσʔλ

  55. "SPVT1PVTFUBM   &YQMPSJOHUIF(%#DIFNJDBMTQBDFVTJOHEFFQHFOFSBUJWFNPEFMT 55 • GDB-13: 13ݸ·Ͱͷheavy atomͰߏ੒͞ΕΔ9.75ԯ෼ࢠ͔ΒͳΔ σʔληοτ

    • ͦͷ͏ͪͷ0.1%ʹ૬౰͢Δ100ສ෼ࢠΛ࢖ֶͬͯश • SMILESΛGRUʹ༩͑ΔγϯϓϧͳϞσϧ • 20ԯ෼ࢠΛαϯϓϧ͢Δ͜ͱʹΑΓGDB-13ͷ68.9%Λ෮ݩ͢Δ͜ ͱ͕Ͱ͖ͨ • GDB-13ʹؚ·ΕΔԽ߹෺ͷಛ௃΋͔ͭΉ͜ͱ͕Ͱ͖ͨ • SMILESͷه๏ʹىҼͯ͠ੜ੒ͮ͠Β͍λΠϓͷ෼ࢠ͕͋Δ͜ͱ ΋෼͔ͬͨʢringΛଟؚ͘Ή΋ͷͳͲʣ
  56. .PMFDVMBSPQUJNJ[BUJPO 56 Choi et al. (2017)

  57. .PMFDVMBSPQUJNJ[BUJPO 57 Latent space಺Λ୳ࡧ • Gradient ascent • ϕΠζ࠷దԽ ڧԽֶश

    Hillclimb-MLE ʢϑΟϧλϦϯάΛ܁Γฦֶͯ͠शʣ Conditioning code ʢ৚݅΋ೖྗͱͯ͠ѻ͏ʣ
  58. .PMFDVMBSPQUJNJ[BUJPOʢಛఆͷ෦෼ߏ଄͔Βελʔτʣ 58 Penalized logPΛ࠷దԽ

  59. ͦͷଞʢ༩͑ͨ෼ࢠͱྨࣅ౓ͷߴ͍෼ࢠΛੜ੒ʣ 59 Drug Analogs from Fragment Based Long Short-Term Memory

    Generative Neural Networks 1. ChEMBL, DrugBank, FDB17౳ͷσʔλΛ࢖ͬͯLSTMΛ pre-train 2. ͦͷޙ1ͭͷ෼ࢠͰfine-tuningʢ10छྨͷ෼ࢠͰ࣮ݧʣ 3. SMILESΛੜ੒ • Retain correct SMILES • Remove duplicates • Remove undesirable functional groups 4. ྨࣅ౓ͷߴ͍෼ࢠΛબͿ ༩͑ͨ෼ࢠͱྨࣅ౓ͷߴ͍෼ࢠΛੜ੒ Awale et el. (2018)
  60. ͦͷଞʢ༩͑ͨ෼ࢠͱྨࣅ౓ͷߴ͍෼ࢠΛੜ੒ʣ 60 Drug Analogs from Fragment Based Long Short-Term Memory

    Generative Neural Networks Awale et el. (2018)
  61. ੜ੒ϞσϧͷੑೳධՁ 61

  62. ੜ੒ϞσϧͷධՁͷ೉͠͞ 62 Karras et al. (2018) • ఆੑతʹྑͦ͞͏ͳ͜ͱ͸෼͔Δ͕ɺఆྔతʹධՁ͢Δ͜ͱ͕೉͍͠ • Խ߹෺ͷ৔߹͸ఆੑతʹධՁ͢Δ͜ͱ΋إը૾ͳͲΑΓ΋೉͍͠

  63. ੜ੒ϞσϧͷϕϯνϚʔΫ 63 • ͦΕͧΕͷ࿦จͰҟͳΔσʔληοτʢChEMBL, ZINC, QM9ͳͲʣɺҟͳΔϝτϦΫεΛ࢖༻͍ͯ͠ΔͨΊൺֱ͕೉ ͍͠ঢ়گ • ·ͨɺൺֱʹ༻͍͍ͯΔϝτϦΫεͷछྨ΋े෼Ͱ͸ͳ͍

  64. #SPXOFUBM  (VBDB.PM %JTUSJCVUJPO-FBSOJOHϕϯνϚʔΫ 64 • Distribution-learningϕϯνϚʔΫͷ໨త • ܇࿅σʔλͷ܏޲Λ൓өͯ͠෼෍Λ͏·͘࠶ݱͰ͖͍ͯΔ͔ΛධՁ •

    ͜ͷλεΫ͕͏·͘͜ͳͤΔΑ͏ʹͳΔͱɺԽ߹෺ͷಛ௃Λ͏·͘ͱΒ͑ΒΕΔΑ͏ʹͳ͍ͬͯΔ͸ͣͰɺgoal-directed taskʹ΋໾ཱͭͱߟ͑ΒΕΔ • Validity • ੜ੒͞ΕͨԽ߹෺ͷ͏ͪͲΕ͘Β͍ͷׂ߹͕༗ޮͰ͋Δ͔ • ༗ޮ͔Ͳ͏͔͸RDKitͰνΣοΫ • Uniqueness • ॏෳΛνΣοΫɻϢχʔΫͳԽ߹෺ͷׂ߹ • Novelty • ৽نੑɻ܇࿅σʔλʹଘࡏ͠ͳ͍Խ߹෺ͷׂ߹ • Frechet ChemNet Distance (FCD) • ੜ෺׆ੑ༧ଌͰֶशͨ͠ChemNetͷಛ௃Λ࢖͍ɺ܇࿅σʔλͷ෼෍ͱͲΕ͘Β͍͍͔ۙΛൺֱ͢Δࢦඪ • ը૾Ͱ͸ੜ੒ϞσϧͷੑೳΛൺֱ͢ΔͨΊʹFrechet Inception Distance (FID)ͱ͍͏ࢦඪ͕࢖ΘΕΔ͕FCD͸ͦͷԽ߹෺൛ • KL Divergence • 2ͭͷ֬཰෼෍ͷࠩΛଌΔͨΊͷࢦඪ • ෺ཧԽֶతಛ௃Λॏࢹ
  65. (PBM%JSFDUFEϕϯνϚʔΫʢNPMFDVMBSPQUJNJ[BUJPOʣ 65 • Goal-DirectedϕϯνϚʔΫͷ໨త • ಛఆͷείΞΛ࠷େԽ͢Δͱ͍͏ઃఆͰධՁ • Similarity • ྨࣅੑɻ܇࿅σʔλ͔ΒऔΓআ͔ΕͨλʔήοτʹͲΕ͘Β͍͚ۙͮΒΕΔ͔

    • Rediscovery • ্هͱࣅ͍ͯΔ͕similarityͰ͸ͳ͘ɺશ͘ಉ͡෼ࢠΛੜ੒Ͱ͖Δ͔ • ͪ͜Β͸׬શҰகΛඞཁͱ͢Δ • Isomers • ྫ͑͹C7H8N2O2ͷΑ͏ͳ෼ࢠʹରͯ͠ͲΕ͘Β͍ҟੑମΛੜ੒Ͱ͖Δ͔ • ૑ༀͱ͸௚઀తʹ͸ؔ܎ͳ͍͕ϞσϧͷॊೈੑΛධՁ • Median molecules • ෳ਺ͷ෼ࢠͱͷsimilarityΛಉ࣌ʹ࠷େԽ
  66. .FBTVSJOH$PNQPVOE2VBMJUZ 66 • Measuring Compound Qualityͷ໨త • ઌߦݚڀͷde novo design

    algorithmʹΑͬͯੜ੒͞ΕͨԽ߹෺͸ෆ҆ఆɺ൓Ԡੑ͕ߴ͍ɺ߹੒͕ࠔ೉ɺmedicinal chemist͕ݟΔ ͱ͓͔͍͠౳ͷ໰୊͕͋ΔՄೳੑ͕͋Δ • ͦͷͨΊɺ·ͱ΋ͳԽ߹෺Ͱ͋Δ͔ΛνΣοΫ͢Δඞཁ͕͋Δ • Medicinal chemist͕࣋ͭ஌ݟΛ͢΂ͯϧʔϧԽͯ͠νΣοΫ͢Δ͜ͱ͸೉͍͠ • ͜͜Ͱ͸rd_filterΛద༻ • https://github.com/PatWalters/rd_filters
  67. ࣮ݧ݁Ռɿ%JTUSJCVUJPOMFBSOJOHϕϯνϚʔΫ 67 • Random samplerɿChEMBL͔Βऔ͖͍ͬͯͯΔ͚ͩͳͷͰഁ୼͍ͯ͠ΔԽ߹෺͸ͳ͘ɺvalidity͸100%ɻ͔͠͠ɺnovelty͸θϩ • SMILES LSTMɿશମతʹྑ͍ • Graph

    MCTSɿׂͱྑ͍͕ɺKLͱFCD͕ѱ͍ • AAEɿFCDҎ֎͸ྑ͍ • ORGANɿશମతʹѱ͍ • VAEɿશମతʹྑ͍
  68. ࣮ݧ݁Ռɿ(PBMEJSFDUFEϕϯνϚʔΫ 68 • Best of Data Set • ܇࿅σʔλͷத͔Β࠷΋είΞͷߴ͍Խ߹෺ΛબΜͩ৔߹ɻ ࠷௿ݶ௒͑ͳ͚Ε͹ͳΒͳ͍ࢦඪɻ

    • Graph GA • Ұ൪ྑ͍݁Ռ • SMILES LSTM • Graph GAͱ΄΅ಉ౳ͷྑ͍݁Ռ • ͦͷଞϞσϧ • Graph GAͱSMILES LSTMʹൺ΂Δͱ໌Β͔ʹѱ͍݁Ռ
  69. ࣮ݧ݁Ռɿ$PNQPVOE2VBMJUZ.FBTVSFNFOU 69 • Goal-directedͳλεΫʹ͓͍ͯੜ੒͞ΕͨԽ߹෺Λrd_filterͰΫΦ ϦςΟʔνΣοΫ • SMILES LSTM͕໌Β͔ʹྑ͍݁Ռ • SMILES

    LSTMͰ͸·ͣpre-training͕͋ΓɺͦΕ͔Β֤είΞͷ࠷ େԽΛߦ͏ͱ͍͏ྲྀΕʹͳ͍ͬͯΔɻPre-trainingͷϑΣʔζͰԽ߹ ෺ͱͯ͠ॏཁͳಛ௃Λ͏·ֶ͘शͰ͖ͨͷͩͱߟ͑ΒΕΔɻ • ҰํɺGraph GA͸͋·Γྑ͘ͳ͍݁Ռɻࣄલ஌ࣝΛ࣋ͭ͜ͱͳ͘ ͍͖ͳΓείΞΛ࠷େԽ͠Α͏ͱ͢Δ෦෼ʹ໰୊͕͋Γͦ͏ɻ • Goal-directedϕϯνϚʔΫͰ͸SMILES LSTMͱGraph GA͸ಉ౳ ͷ݁ՌͩͬͨͷͰɺSMILES LSTMΛ࢖ͬͨํ͕ྑ͍ɻ
  70. 3FTUSJDUFE˜&MJY *OD 1ÖMTUFSM8BDIJOHFS  -'.PM("/ 70 • Validity, uniqueness, novelty͕ྑ͘࢖ΘΕΔ͕͋·ΓΑ͍ϝτϦΫεͰ͸ͳ͍

    • ϊʔυͱΤοδΛϥϯμϜʹબͿϞσϧʢvalency͸ߟྀʣ͕ྑ͘ݟ͑ͯ͠·͏ • ֶशσʔλͱࣅ͍ͯͯԽֶతʹҙຯͷ͋Δ෼ࢠ͕ੜ੒͞Ε͍ͯΔ͔͸ߟྀ͞Εͯ ͍ͳ͍
  71. ࠓޙͷൃలͷํ޲ੑ 71

  72. 3FTUSJDUFE˜&MJY *OD .VMUJPCKFDUJWFPQUJNJ[BUJPO (VJNBSBFTFUBM  03("/ 72 • Druglikeness, synthesizability,

    solubilityͰަޓʹֶश͢Δ͜ͱʹΑΓ3ͭͷಛੑΛ࠷దԽ • 3ͭ࠷దԽͯ͠΋ͦΕͧΕ1͚ͭͩΛ࠷దԽͨ࣌͠ʹ͍ۙ݁Ռ
  73. 3FTUSJDUFE˜&MJY *OD .VMUJPCKFDUJWFPQUJNJ[BUJPO ;IPVFUBM  .PM%2/ 73 • DQNʹΑΓ࠷దԽΛߦ͏ੜ੒Ϟσϧ •

    SimilarityͱQED (drug-likeness) Λಉ࣌ʹ࠷దԽ͢Δ࣮ݧΛߦ͍ͬͯΔ
  74. 3FTUSJDUFE˜&MJY *OD σʔληοτͳ͠ 1VSF3- .PM%2/ ;IPVFUBM  74 • ڧԽֶशΛར༻͢Δ͜ͱʹΑΓσʔληοτͳ͠Ͱ΋ֶश

    • Pre-train͠ͳ͍ͨΊ෯޿͍୳ࡧ͕Մೳ
  75. 3FTUSJDUFE˜&MJY *OD ߹੒ܦ࿏΋ߟྀɹ#SBETIBXFUBM  .PMFDVMF$IFG 75 Encoder Decoder ߹੒ܦ࿏΋ߟྀͨ͠Ϟσϧɻ൓Ԡ෺ͱੜ੒෺ͷ྆ํΛग़ྗɻ ൓Ԡ෺Λॱʹग़ྗɻ൓Ԡ෺͸ط஌ͷ΋ͷ͔Βબ͹ΕΔɻ

    ͦͷޙreaction predictorʹΑΓੜ੒෺ʹɻ Graph neural networkʹΑΓ൓Ԡ෺ͷembeddingΛಘΔ
  76. &MJY *OD IUUQTFMJYJODDPN 76