Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
生成モデルを中心としたAI創薬最前線 / Elix CBI 2019
Search
Elix
October 22, 2019
Technology
4
5k
生成モデルを中心としたAI創薬最前線 / Elix CBI 2019
AI創薬で利用される様々な生成モデルについてまとめています。CBI学会2019での講演スライドです。
Elix
October 22, 2019
Tweet
Share
More Decks by Elix
See All by Elix
Molecular Generation of Non-covalent KRAS Inhibitor Candidates Using Machine Learning on Elix Discovery™, Elix, 8th Autumn School of Chemoinformatics, Nara
elix
0
67
Elix, CBI 2023, ランチョンセミナー, 大規模言語モデルの基本から最前線へ
elix
0
200
Efficient and Scalable Framework for Activity Prediction with kMol, Elix, CBI 2023
elix
0
61
Protein - Ligand Affinity Prediction_Strategizing Data Usage for Virtual Screening, Elix, CBI 2023
elix
0
64
Elix, CBI 2023, フォーカストセッション, 生成モデルを中心としたElixにおけるAI創薬
elix
0
190
Benchmarking Deployed Generative Models on Elix Discovery, Elix, CBI 2023
elix
0
82
Elix, AI創薬入門ウェビナー, 分子構造生成モデルの基本とその動向
elix
0
500
Elix, GTCヘルスケアフォローアップウェビナー, Transformerの要点とその化学への応用
elix
0
210
Elix, 出版記念ウェビナー, ざっくりわかる書籍のダイジェスト
elix
0
45
Other Decks in Technology
See All in Technology
日本におけるデータエンジニアリングのこれまでとこれから
foursue
12
2.5k
テストプロセスで大事にしていること #jasstnano
makky_tyuyan
0
130
長期運用プロジェクトでのMySQLからTiDB移行の検証
colopl
2
690
Discord とビルダー&チャットボットの使い方 / How to use Discord and Builder & Chatbots
ks91
PRO
0
130
少数チームで挑む: SwiftUI, TCA, KMPを用いた 新規動画配信アプリ 「ABEMA Live」の開発について
tomu28
0
540
KubeCon EU 2024 Recap “Kubernetes Policy Time Machine: Where to Next?”
ryysud
0
140
ChatGPT for IT Service Management (IT Pro)
dahatake
3
240
4年前、あるじゃん老害エンジニアLT合戦に登壇、米国西海岸コンピュータ歴史博物館体験記の続編
toshi_atsumi
0
200
キャラクター制御のためのプロンプト術 for LINE Bot
uezo
0
530
LLM とプロンプトエンジニアリング/チューターをビルドする / LLM and Prompt Engineering and Building Tutors
ks91
PRO
0
220
Janus
bkuhlmann
1
490
0→1開発における技術選定において一番大切なこと
bicstone
1
330
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
36
2.5k
Adopting Sorbet at Scale
ufuk
67
8.6k
How to name files
jennybc
64
92k
jQuery: Nuts, Bolts and Bling
dougneiner
59
7.1k
Making Projects Easy
brettharned
108
5.5k
From Idea to $5000 a Month in 5 Months
shpigford
377
45k
YesSQL, Process and Tooling at Scale
rocio
163
13k
Embracing the Ebb and Flow
colly
79
4.1k
The Language of Interfaces
destraynor
151
23k
A Tale of Four Properties
chriscoyier
150
22k
Designing for humans not robots
tammielis
247
25k
Building an army of robots
kneath
300
41k
Transcript
ੜϞσϧΛத৺ͱͨ͠"*ༀ࠷લઢ גࣜձࣾ&MJY $&0݁৳࠸ 2019/10/22 1 $#*ֶձେձ
࣍ 2 • ΠϯτϩμΫγϣϯ • ཁૉٕज़ • Fingerprint, SMILESϕʔεͷϞσϧ •
άϥϑϕʔεͷϞσϧ • ੜϞσϧͷར༻๏ • ੜϞσϧͷੑೳධՁ • ࠓޙͷൃలͷํੑ • Elix Chem
ΠϯτϩμΫγϣϯ 3
3FTUSJDUFE&MJY *OD ࢠઃܭ 4 Sanchez-Lengeling et al. (2018) ࣮ݧ/γϛϡϨʔγϣϯ ༧ଌϞσϧ
ੜϞσϧ Drug-likeͳࢠʙ10^60ݸ
3FTUSJDUFE&MJY *OD Α͘༻͍ΒΕΔදݱํ๏ 5 Fingerprint SMILES Graph Meter & Coote
(2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
ಛʹΑ͘༻͍ΒΕΔදݱํ๏ 6 • Fingerprint • ༷ʑͳछྨ͕ଘࡏ͢Δ͕ECFPͳͲ͕ಛʹ༗໊ • ֤Ϗοτ͕ಛఆͷߏʹରԠ • Collision͕ى͖ͯ͠·͏Մೳੑ͕͋Δ
• InvertibleͰͳ͍ • SMILES • Խ߹Λจࣈྻͱͯ͠දݱ • ҰͭͷԽ߹ʹରͯ͠Ұҙʹܾ·Βͳ͍ • Θ͔ͣʹҟͳΔԽ߹SMILESͱͯ͠େ͖͘มΘͬͯ͠·͏߹ ʢԽ߹ͷsimilarityΛදݱ͢ΔΑ͏ʹσβΠϯ͞Ε͍ͯͳ͍ʣ • Graph • Խ߹ΛϊʔυΛΤοδͱͯ͠දݱ • ࣗવͳදݱํ๏ʹࢥ͑Δ https://arxiv.org/abs/1802.04364 https://arxiv.org/abs/1903.04388
༷ʑͳ༧ଌϞσϧ 7 Wu et al. (2017) άϥϑϕʔεͷϞσϧͷํ͕ྑ͍݁Ռ͕Ͱ͋Δ͜ͱ͕ଟ͍
ੜϞσϧͷϕʔεͱͳΔΞʔΩςΫνϟ 8 Sanchez-Lengeling&Aspuru-Guzik (2018)
༷ʑͳΈ߹Θͤ 9 Schwalbe-Koda & Gómez-Bombarelli (2019)
3FTUSJDUFE&MJY *OD ࠷৽ͷੜϞσϧҰཡ 10 Elton et al. (2019)
Α͘ΘΕΔެ։σʔληοτҰཡ 11 https://arxiv.org/abs/1903.04388 Elton et al. (2019)
ཁૉٕज़ 12
3FTUSJDUFE&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 13 Karras et al. (2018)
3FTUSJDUFE&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 14 ੜϞσϧͷҰछ Generator (G): ِͷը૾Λੜ͠ɺDΛὃͦ͏ͱ͢Δ Discriminator
(D): ຊͷը૾ͱِͷը૾Λݟ͚Α͏ͱ͢Δ Noise G D ຊ or ِʁ ِͷը૾ ʢੜը૾ʣ ຊͷը૾ ʢTraining setʣ Karras et al. (2017)
3FTUSJDUFE&MJY *OD (FOFSBUJWF"EWFSTBSJBM/FUXPSLT ("/T 15
3FTUSJDUFE&MJY *OD "VUPFODPEFST 16
3FTUSJDUFE&MJY *OD "VUPFODPEFST 17
3FTUSJDUFE&MJY *OD 7BSJBUJPOBM"VUPFODPEFST 7"&T 18 reconstruction ਖ਼ن͔ΒͷͣΕ
3FTUSJDUFE&MJY *OD 3FDVSSFOU/FVSBM/FUXPSLT 3//T 19 Segler et al. (2017)
3FTUSJDUFE&MJY *OD (SBQI3FQSFTFOUBUJPOT 20 Peter et al. (2018) https://www.businessinsider.com/explainer-what-exactly-is-the-social-graph-2012-3
3FTUSJDUFE&MJY *OD (SBQI/FVSBM/FUXPSLT 21 Peter et al. (2018)
3FTUSJDUFE&MJY *OD (SBQI/FVSBM/FUXPSLT 22 Peter et al. (2018)
3FTUSJDUFE&MJY *OD (SBQI$POWPMVUJPOBM/FUXPSLT 23 2D Convolution Graph Convolution Graph Convolutional
Networks Wu et al. (2019)
3FTUSJDUFE&MJY *OD 3FJOGPSDFNFOU-FBSOJOH 3- ڧԽֶश 24 Sutton & Barto (2018)
Mnih et al. (2015)
3FTUSJDUFE&MJY *OD 3FJOGPSDFNFOU-FBSOJOH 3- ڧԽֶश 25 Sutton & Barto (2018)
Mnih et al. (2015) ex) QED, logP
3FTUSJDUFE&MJY *OD 5SBOTGFS-FBSOJOHʢసҠֶशʣ 26 ඇৗʹେ͖ͳϥϕϧͳ͠σʔλ গྔͷڭࢣσʔλ RDKitͰlogPͳͲΛΛܭࢉ͠ɺ pre-train Goh et
al. (2017)
'JOHFSQSJOU 4.*-&4ϕʔεͷϞσϧ 27
3FTUSJDUFE&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 28 Fingerprint SMILES Graph Meter & Coote
(2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
3FTUSJDUFE&MJY *OD ,BEVSJOFUBM 29 • ೖग़ྗ • Binary fingerprints
(MACCS) • Log concentration (LCONC) • தؒ • 5ͭͷχϡʔϩϯͰߏ • 1ͭGrowth Inhibition percentage (GI) • Γ4ͭਖ਼نʹۙͮ͘Α͏ʹֶश The cornucopia of meaningful leads: Applying deep AAEs for new molecule development in oncology
3FTUSJDUFE&MJY *OD ,BEVSJOFUBM 30 σʔληοτ Λ༻ҙֶ͠श Ϟσϧ͔Β αϯϓϧ நग़
ࣅͨಛͷ Խ߹Λ୳ࡧ • NCI-60, MCF-7 • 6252ͷԽ߹ • Fingerprint, LCONC, GI͔ΒΔσʔλ •640ݸͷϕΫτϧ ʢԾతͳԽ߹ ʣΛαϯϓϧ •LCONC < -5.0 M ͷͷΛநग़ •32ݸͷϕΫτϧΛಘΔ •ࣅͨಛͷԽ߹Λ PubChem͔Β୳͠ ग़͢ ࣮ݧͷྲྀΕ
3FTUSJDUFE&MJY *OD ,BEVSJOFUBM 31 • PubChemɿ7200ສͷԽ߹ • ੜͨ͠32ݸͷϕΫτϧͱࣅͨಛΛ࣋ͭԽ߹ ΛPubChem͔Βநग़
• ࠷ऴతʹ69ݸͷԽ߹Λಘͨ • طʹ߅͕Μࡎͱͯ͠ΒΕ͍ͯΔͷ͕ෳ • 13ݸಛڐ͕औΒΕ͍ͯΔͷ • ΄ͱΜͲΞϯτϥαΠΫϦϯܥ ʢݱࡏ࠷ޮՌతͳ߅͕Μࡎʣ : PubChem ੨: ֶशσʔλ : ੜϕΫτϧʢԾతͳԽ߹ʣ ࣮ݧ݁Ռ
3FTUSJDUFE&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 32 Fingerprint SMILES Graph Meter & Coote
(2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
3FTUSJDUFE&MJY *OD 4FHMFSFUBM 33 • LSTMʹΑΓԽ߹Λੜ • ೖग़ྗSMILES •
ԼهΛ܁Γฦ͢ʢHillclimb-MLEͱݺΕΔʣ 1. LSTMͰֶशɾαϯϓϧ 2. Target filtering modelͰϑΟϧλϦϯά ʢػցֶशҎ֎Մʣ Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
3FTUSJDUFE&MJY *OD (PNF[#PNCBSFMMJFUBM $7"& 34 • RNN+VAEʹΑΓԽ߹Λੜ • ೖग़ྗSMILES
• λʔήοτͱ͢Δಛੑ͕େ͖͍latent code Λݟ͚ͭΔ Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
3FTUSJDUFE&MJY *OD ,VTOFSFUBM (7"& 35 Grammar Variational Autoencoder Encoder
Decoder จ๏ʢcontext free grammarʣΛߟྀͯ͠ੜ
3FTUSJDUFE&MJY *OD :BOHFUBM $IFN54 36 MCTSͱRNNʹΑΓSMILESΛੜ Penalized logPΛ࠷దԽ
3FTUSJDUFE&MJY *OD 1PQPWBFUBM 3F-FB4& 37 https://arxiv.org/abs/1711.10907 Popova et al.
(2017) • SMILESϕʔεͷੜϞσϧ • ඪಛੑΛ࠷దԽ͢ΔͨΊʹڧԽֶशͱΈ߹Θͤ ͍ͯΔ • ௨ৗrewardΛRDKitͰܭࢉ͢Δ͜ͱ͕ଟ͍͕ɺ SMILESϕʔεͷ༧ଌϞσϧʹΑΓrewardΛܭࢉͯ͠ ͍Δ • ͜ΕʹΑΓRDKitͰܭࢉͰ͖ͳ͍ಛੑ࠷దԽ
3FTUSJDUFE&MJY *OD (VJNBSBFTFUBM 03("/ 38 • SeqGANͱ͍͏sequential data༻ͷRNNϕʔεͷGAN͕جʹͳ͍ͬͯΔ •
DruglikenessͳͲͷಛੑΛ࠷దԽ͢ΔͨΊʹڧԽֶशΛಋೖ
3FTUSJDUFE&MJY *OD "MM4.*-&47"& 39 • άϥϑܥϞσϧ • 3ʙ7͘Β͍ͷͷ͕ଟ͍ • 1ʹ͖ͭ1ͭͷڑʹ͋Δใ͕
• ZINC250kʹؚ·ΕΔࢠ • ฏۉܘ͕11.1 • ࠷େܘ24 • ࢠશମʹใΛ͖͑Δ͜ͱ͕Ͱ͖ͳ͍ • RNNͰ͍ใΛ͑Δ • SMILESҰҙʹܾ·Βͳ͍ • ෳͷSMILESΛೖྗʹར༻ Alperstein et al. (2019)
άϥϑϕʔεͷϞσϧʢPOFTIPUܕʣ 40
3FTUSJDUFE&MJY *OD .PMFDVMFSFQSFTFOUBUJPO 41 Fingerprint SMILES Graph Meter & Coote
(2019) Schwalbe-Koda & Gómez-Bombarelli (2019)
3FTUSJDUFE&MJY *OD %F$BP,JQG .PM("/ 42 • DiscriminatorͰgraph convΛར༻͢Δ͜ͱʹΑΓorder invariantʹ
• ֤ಛੑΛ࠷దԽ͢Δ͜ͱ͏·͍͍ͬͯ͘ΔΑ͏ʹݟ͑Δ • ͔͠͠ɺuniqueness͕2%ఔͱඇৗʹ͍ʢGoal-directedͳ߹ʣ • GANRLͰग़ྗΛଟ༷ʹ͢ΔΑ͏ͳ੍͕ͳ͍ͨΊ • ҰൃͰάϥϑΛੜ͢ΔͨΊܭࢉ͕͍࣌ؒ • QM9Ͱ࣮ݧɻߋʹେ͖ͳάϥϑʹద༻͢Δͷͦ͠͏ άϥϑΛҰൃͰੜ͢ΔλΠϓͷϞσϧɻGANͱڧԽֶशར༻ɻ
3FTUSJDUFE&MJY *OD 1ÖMTUFSM8BDIJOHFS -'.PM("/ 43 • MolGANͷΑ͏ʹάϥϑΛҰൃͰੜ͢ΔλΠϓɻ͜ͷϞσϧͰvalencyʹؔ͢Δ੍Λಋೖ • Reconstruction
lossΛexplicitʹܭࢉ͢Δ͜ͱ͕ͳ͘ɺgraph isomorphism problemΛճආ • ී௨ͷGANͱҧͬͯencoderؚΉߏʹͳ͍ͬͯͯɺlatent spaceͰsimilarity͕ߴ͍ࢠΛ୳͢͜ͱ͕༰қ • QM9Ͱ࣮ݧ
άϥϑϕʔεͷϞσϧʢSFDVSSFOUܕʣ 44
3FTUSJDUFE&MJY *OD -JFUBM 45 Learning Deep Generative Models of
Graphs SMILESͰͳ͘άϥϑͱͯ͠ϊʔυͱΤοδΛॱʹՃ GrammarVAEͳͲΑΓྑ͍݁Ռ
3FTUSJDUFE&MJY *OD +JOFUBM +57"& 46 Junction Tree Variational Autoencoder
for Molecular Graph Generation • ୯७ʹϊʔυΛҰͭͻͱͭՃ͍ͯ͘͠Ξϓϩʔν͕ߟ͑ ΒΕΔ • ͔͠͠ɺ͜Εͩͱ࣮ࡍʹଘࡏ͠ͳ͍Խ߹͕ੜ͞Εͯ͠ ·͏Մೳੑ͕͋Δ • ͦ͜ͰΫϥελ͝ͱʹੜ͍ͯ͘͠
3FTUSJDUFE&MJY *OD +JOFUBM +57"& 47 ࣄલʹఆ͓͍ٛͯͨ͠ΫϥελΛͬ ͯπϦʔߏʹղ EmbeddingΛͱʹ৽ͨͳπϦʔߏΛߏங ʢϊʔυΛҰͭͻͱͭՃ͍ͯ͘͠ํࣜʣ
Neural message passing ʹΑΓΤϯίʔυ ಘΒΕͨgraph embeddingͱπϦʔߏͷ ྆ํΛͬͯ࠷ऴతͳԽ߹Λੜ ʢΫϥελΛͲ͏Έ߹ΘͤΔ͔ͱ͍͏ࣗ༝ ͕͋ΔͨΊ͜ͷεςοϓ͕ඞཁʣ GRUʹΑΓΤϯίʔυ
3FTUSJDUFE&MJY *OD :PVFUBM ($1/ 48 Graph Convolutional Policy Network
for Goal-Directed Molecular Graph Generation ΤοδΛҰͭͣͭՃ͢Δ͜ͱͰάϥϑΛੜ GANͱڧԽֶशΛΈ߹ΘͤͨϞσϧ
3FTUSJDUFE&MJY *OD -JFUBM .PM.1.PM3// 49 QEDSAscoreͷ conditional codeΛೖΕΔ λʔήοτͱ͢ΔಛੑͳͲͰcondition͢ΔλΠϓͷϞσϧ
3FTUSJDUFE&MJY *OD ("/ͱ7"&ͷൺֱ 50 GAN • ϝϦοτ • ͏·͘νϡʔχϯάͰ͖Δͱྑ͍݁Ռ •
Reconstruction lossΛܭࢉ͠ͳͯ͘ྑ͍ʢgraph isomorphism problemΛճආʣ • σϝϦοτ • ϋΠύʔύϥϝʔλνϡʔχϯά͕ࠔ • Mode-collapseʢಉ͡ͷ͔Γੜͯ͠͠·͏ʣ VAE • ϝϦοτ • GANΑΓ҆ఆͯ͠ಈ͘ • ϋΠύʔύϥϝʔλνϡʔχϯάָ͕ • Mode-collapseى͖ʹ͍͘ • σϝϦοτ • Reconstruction lossΛܭࢉ͢ΔͨΊgraph isomorphism problem͕ग़ͯ͘Δ
3FTUSJDUFE&MJY *OD 'JOHFSQSJOU 4.*-&4 (SBQIͷൺֱ 51 • Fingerprintϕʔε • FingerprintinvertibleͰͳ͍ͨΊ͍ͮΒ͍
ʢͦͷͨΊ΄ͱΜͲݟ͔͚ͳ͍ʣ • SMILESϕʔε • ҆ఆͨ͠ੑೳ • Validity͕͘ͳͬͯ͠·͏ • Fragment-base generation͕͍͠ • Graphϕʔεʢone-shotܕʣ • ߴ • Heavy atom͕9ҎԼͷখ͞ͳࢠ͔͠࡞Ε͍ͯͳ͍ • Validityuniqueness͕͍ • Graphϕʔεʢrecurrentܕʣ • Validity͕ߴ͍ • ϊʔυͱΤοδͷorderingͷ
ੜϞσϧͷར༻๏ 52
3FTUSJDUFE&MJY *OD .PMFDVMFHFOFSBUJPO 53 Distribution Learning Predefined Scaffold Molecule Optimization
%JTUSJCVUJPO-FBSOJOH 54 https://github.com/NVlabs/ffhq-dataset Karras et al. (2018) ֶशσʔλ ੜ͞Εͨσʔλ
"SPVT1PVTFUBM &YQMPSJOHUIF(%#DIFNJDBMTQBDFVTJOHEFFQHFOFSBUJWFNPEFMT 55 • GDB-13: 13ݸ·Ͱͷheavy atomͰߏ͞ΕΔ9.75ԯࢠ͔ΒͳΔ σʔληοτ
• ͦͷ͏ͪͷ0.1%ʹ૬͢Δ100ສࢠΛֶͬͯश • SMILESΛGRUʹ༩͑ΔγϯϓϧͳϞσϧ • 20ԯࢠΛαϯϓϧ͢Δ͜ͱʹΑΓGDB-13ͷ68.9%Λ෮ݩ͢Δ͜ ͱ͕Ͱ͖ͨ • GDB-13ʹؚ·ΕΔԽ߹ͷಛ͔ͭΉ͜ͱ͕Ͱ͖ͨ • SMILESͷه๏ʹىҼͯ͠ੜͮ͠Β͍λΠϓͷࢠ͕͋Δ͜ͱ ͔ͬͨʢringΛଟؚ͘ΉͷͳͲʣ
.PMFDVMBSPQUJNJ[BUJPO 56 Choi et al. (2017)
.PMFDVMBSPQUJNJ[BUJPO 57 Latent spaceΛ୳ࡧ • Gradient ascent • ϕΠζ࠷దԽ ڧԽֶश
Hillclimb-MLE ʢϑΟϧλϦϯάΛ܁Γฦֶͯ͠शʣ Conditioning code ʢ݅ೖྗͱͯ͠ѻ͏ʣ
.PMFDVMBSPQUJNJ[BUJPOʢಛఆͷ෦ߏ͔Βελʔτʣ 58 Penalized logPΛ࠷దԽ
ͦͷଞʢ༩͑ͨࢠͱྨࣅͷߴ͍ࢠΛੜʣ 59 Drug Analogs from Fragment Based Long Short-Term Memory
Generative Neural Networks 1. ChEMBL, DrugBank, FDB17ͷσʔλΛͬͯLSTMΛ pre-train 2. ͦͷޙ1ͭͷࢠͰfine-tuningʢ10छྨͷࢠͰ࣮ݧʣ 3. SMILESΛੜ • Retain correct SMILES • Remove duplicates • Remove undesirable functional groups 4. ྨࣅͷߴ͍ࢠΛબͿ ༩͑ͨࢠͱྨࣅͷߴ͍ࢠΛੜ Awale et el. (2018)
ͦͷଞʢ༩͑ͨࢠͱྨࣅͷߴ͍ࢠΛੜʣ 60 Drug Analogs from Fragment Based Long Short-Term Memory
Generative Neural Networks Awale et el. (2018)
ੜϞσϧͷੑೳධՁ 61
ੜϞσϧͷධՁͷ͠͞ 62 Karras et al. (2018) • ఆੑతʹྑͦ͞͏ͳ͜ͱ͔Δ͕ɺఆྔతʹධՁ͢Δ͜ͱ͕͍͠ • Խ߹ͷ߹ఆੑతʹධՁ͢Δ͜ͱإը૾ͳͲΑΓ͍͠
ੜϞσϧͷϕϯνϚʔΫ 63 • ͦΕͧΕͷจͰҟͳΔσʔληοτʢChEMBL, ZINC, QM9ͳͲʣɺҟͳΔϝτϦΫεΛ༻͍ͯ͠ΔͨΊൺֱ͕ ͍͠ঢ়گ • ·ͨɺൺֱʹ༻͍͍ͯΔϝτϦΫεͷछྨेͰͳ͍
#SPXOFUBM (VBDB.PM %JTUSJCVUJPO-FBSOJOHϕϯνϚʔΫ 64 • Distribution-learningϕϯνϚʔΫͷత • ܇࿅σʔλͷΛөͯ͠Λ͏·͘࠶ݱͰ͖͍ͯΔ͔ΛධՁ •
͜ͷλεΫ͕͏·͘͜ͳͤΔΑ͏ʹͳΔͱɺԽ߹ͷಛΛ͏·͘ͱΒ͑ΒΕΔΑ͏ʹͳ͍ͬͯΔͣͰɺgoal-directed taskʹཱͭͱߟ͑ΒΕΔ • Validity • ੜ͞ΕͨԽ߹ͷ͏ͪͲΕ͘Β͍ͷׂ߹͕༗ޮͰ͋Δ͔ • ༗ޮ͔Ͳ͏͔RDKitͰνΣοΫ • Uniqueness • ॏෳΛνΣοΫɻϢχʔΫͳԽ߹ͷׂ߹ • Novelty • ৽نੑɻ܇࿅σʔλʹଘࡏ͠ͳ͍Խ߹ͷׂ߹ • Frechet ChemNet Distance (FCD) • ੜ׆ੑ༧ଌͰֶशͨ͠ChemNetͷಛΛ͍ɺ܇࿅σʔλͷͱͲΕ͘Β͍͍͔ۙΛൺֱ͢Δࢦඪ • ը૾ͰੜϞσϧͷੑೳΛൺֱ͢ΔͨΊʹFrechet Inception Distance (FID)ͱ͍͏ࢦඪ͕ΘΕΔ͕FCDͦͷԽ߹൛ • KL Divergence • 2ͭͷ֬ͷࠩΛଌΔͨΊͷࢦඪ • ཧԽֶతಛΛॏࢹ
(PBM%JSFDUFEϕϯνϚʔΫʢNPMFDVMBSPQUJNJ[BUJPOʣ 65 • Goal-DirectedϕϯνϚʔΫͷత • ಛఆͷείΞΛ࠷େԽ͢Δͱ͍͏ઃఆͰධՁ • Similarity • ྨࣅੑɻ܇࿅σʔλ͔ΒऔΓআ͔ΕͨλʔήοτʹͲΕ͘Β͍͚ۙͮΒΕΔ͔
• Rediscovery • ্هͱࣅ͍ͯΔ͕similarityͰͳ͘ɺશ͘ಉ͡ࢠΛੜͰ͖Δ͔ • ͪ͜ΒશҰகΛඞཁͱ͢Δ • Isomers • ྫ͑C7H8N2O2ͷΑ͏ͳࢠʹରͯ͠ͲΕ͘Β͍ҟੑମΛੜͰ͖Δ͔ • ༀͱతʹؔͳ͍͕ϞσϧͷॊೈੑΛධՁ • Median molecules • ෳͷࢠͱͷsimilarityΛಉ࣌ʹ࠷େԽ
.FBTVSJOH$PNQPVOE2VBMJUZ 66 • Measuring Compound Qualityͷత • ઌߦݚڀͷde novo design
algorithmʹΑͬͯੜ͞ΕͨԽ߹ෆ҆ఆɺԠੑ͕ߴ͍ɺ߹͕ࠔɺmedicinal chemist͕ݟΔ ͱ͓͔͍͠ͷ͕͋ΔՄೳੑ͕͋Δ • ͦͷͨΊɺ·ͱͳԽ߹Ͱ͋Δ͔ΛνΣοΫ͢Δඞཁ͕͋Δ • Medicinal chemist͕࣋ͭݟΛͯ͢ϧʔϧԽͯ͠νΣοΫ͢Δ͜ͱ͍͠ • ͜͜Ͱrd_filterΛద༻ • https://github.com/PatWalters/rd_filters
࣮ݧ݁Ռɿ%JTUSJCVUJPOMFBSOJOHϕϯνϚʔΫ 67 • Random samplerɿChEMBL͔Βऔ͖͍ͬͯͯΔ͚ͩͳͷͰഁ͍ͯ͠ΔԽ߹ͳ͘ɺvalidity100%ɻ͔͠͠ɺnoveltyθϩ • SMILES LSTMɿશମతʹྑ͍ • Graph
MCTSɿׂͱྑ͍͕ɺKLͱFCD͕ѱ͍ • AAEɿFCDҎ֎ྑ͍ • ORGANɿશମతʹѱ͍ • VAEɿશମతʹྑ͍
࣮ݧ݁Ռɿ(PBMEJSFDUFEϕϯνϚʔΫ 68 • Best of Data Set • ܇࿅σʔλͷத͔Β࠷είΞͷߴ͍Խ߹ΛબΜͩ߹ɻ ࠷ݶ͑ͳ͚ΕͳΒͳ͍ࢦඪɻ
• Graph GA • Ұ൪ྑ͍݁Ռ • SMILES LSTM • Graph GAͱ΄΅ಉͷྑ͍݁Ռ • ͦͷଞϞσϧ • Graph GAͱSMILES LSTMʹൺΔͱ໌Β͔ʹѱ͍݁Ռ
࣮ݧ݁Ռɿ$PNQPVOE2VBMJUZ.FBTVSFNFOU 69 • Goal-directedͳλεΫʹ͓͍ͯੜ͞ΕͨԽ߹Λrd_filterͰΫΦ ϦςΟʔνΣοΫ • SMILES LSTM͕໌Β͔ʹྑ͍݁Ռ • SMILES
LSTMͰ·ͣpre-training͕͋ΓɺͦΕ͔Β֤είΞͷ࠷ େԽΛߦ͏ͱ͍͏ྲྀΕʹͳ͍ͬͯΔɻPre-trainingͷϑΣʔζͰԽ߹ ͱͯ͠ॏཁͳಛΛ͏·ֶ͘शͰ͖ͨͷͩͱߟ͑ΒΕΔɻ • ҰํɺGraph GA͋·Γྑ͘ͳ͍݁ՌɻࣄલࣝΛ࣋ͭ͜ͱͳ͘ ͍͖ͳΓείΞΛ࠷େԽ͠Α͏ͱ͢Δ෦ʹ͕͋Γͦ͏ɻ • Goal-directedϕϯνϚʔΫͰSMILES LSTMͱGraph GAಉ ͷ݁ՌͩͬͨͷͰɺSMILES LSTMΛͬͨํ͕ྑ͍ɻ
3FTUSJDUFE&MJY *OD 1ÖMTUFSM8BDIJOHFS -'.PM("/ 70 • Validity, uniqueness, novelty͕ྑ͘ΘΕΔ͕͋·ΓΑ͍ϝτϦΫεͰͳ͍
• ϊʔυͱΤοδΛϥϯμϜʹબͿϞσϧʢvalencyߟྀʣ͕ྑ͘ݟ͑ͯ͠·͏ • ֶशσʔλͱࣅ͍ͯͯԽֶతʹҙຯͷ͋Δࢠ͕ੜ͞Ε͍ͯΔ͔ߟྀ͞Εͯ ͍ͳ͍
ࠓޙͷൃలͷํੑ 71
3FTUSJDUFE&MJY *OD .VMUJPCKFDUJWFPQUJNJ[BUJPO (VJNBSBFTFUBM 03("/ 72 • Druglikeness, synthesizability,
solubilityͰަޓʹֶश͢Δ͜ͱʹΑΓ3ͭͷಛੑΛ࠷దԽ • 3ͭ࠷దԽͯͦ͠ΕͧΕ1͚ͭͩΛ࠷దԽͨ࣌͠ʹ͍ۙ݁Ռ
3FTUSJDUFE&MJY *OD .VMUJPCKFDUJWFPQUJNJ[BUJPO ;IPVFUBM .PM%2/ 73 • DQNʹΑΓ࠷దԽΛߦ͏ੜϞσϧ •
SimilarityͱQED (drug-likeness) Λಉ࣌ʹ࠷దԽ͢Δ࣮ݧΛߦ͍ͬͯΔ
3FTUSJDUFE&MJY *OD σʔληοτͳ͠ 1VSF3- .PM%2/ ;IPVFUBM 74 • ڧԽֶशΛར༻͢Δ͜ͱʹΑΓσʔληοτͳ͠Ͱֶश
• Pre-train͠ͳ͍ͨΊ෯͍୳ࡧ͕Մೳ
3FTUSJDUFE&MJY *OD ߹ܦ࿏ߟྀɹ#SBETIBXFUBM .PMFDVMF$IFG 75 Encoder Decoder ߹ܦ࿏ߟྀͨ͠ϞσϧɻԠͱੜͷ྆ํΛग़ྗɻ ԠΛॱʹग़ྗɻԠطͷͷ͔ΒબΕΔɻ
ͦͷޙreaction predictorʹΑΓੜʹɻ Graph neural networkʹΑΓԠͷembeddingΛಘΔ
&MJY *OD IUUQTFMJYJODDPN 76