Oral Presentation October 27th, 2021 Haris Hasić 1 ,2 (email@example.com, firstname.lastname@example.org) Takashi Ishida 1 (email@example.com) 1 Ishida Laboratory, Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 2 Elix Inc., Tokyo, Japan Chem-Bio Informatics Society Annual Meeting 2021, Online
Slide 3 Chemical Synthesis Project Introduction ▪ The artificial execution of chemical reactions in order to obtain a single or multiple target chemical compounds. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Chemical Synthesis of Zolpidem (Reaxys** ID: 53518301) Reactant Compounds Target Compound Target: Zolpidem PubChem* ID: 5732
Slide 4 Retrosynthesis Project Introduction ▪ Strategy for planning chemical synthesis by analysing target compounds and potential chemical reactions in reverse. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Retrosynthesis of Zolpidem (Reaxys** ID: 53518301) Target: Zolpidem PubChem* ID: 5732 1. Where to perform disconnections? 2. Which chemical reaction is assumed? 3. How to determine precursor compounds? Answer: Knowledge and Experience Target Compound Synthon Structures Precursor Compounds
Slide 5 Computer-aided Synthesis Project Introduction ▪ The two main tasks of computer-aided synthesis are: 1. Synthesis Route Planning – The rule-based procedure for solving complex target compounds. 2. Single-step Retrosynthesis – The disconnection suggestion procedure at each step. Solved (Procurable compounds.) Target Compound (Chemical compound structure with desirable properties.) Unsolved (Non-solvable, non-procurable compounds.) To maximize the success of synthesis route planning, the most important factor is an efficient single-step retrosynthesis procedure.
Slide 6 Single-step Retrosynthesis Project Introduction ▪ Top-N Accuracy – Aggregated accuracy that reflects the probability of the ground truth precursor compound combination being found within the first N suggestions. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Reaxys** ID: 53518301 Reaction Templates – Collections of sub-graph patterns that specify the relationships of structural changes in the participating compounds. Reaction SMILES – Simplified molecular-input line-entry system adapted for chemical reactions. CC1=CC=C(C=C1)C1=CN2C=C(C)C=CC2=N1 CC1=CC=C(N)N=C1 CC1=CC=C(C=C1)C(=O)CBr Black Box (e.g., Machine Learning Model, Rule-based System, etc.) #1 … #N How to properly evaluate?
Slide 7 Computer-aided Synthesis Research Landscape Project Introduction ▪ The current research landscape for computer-aided synthesis is heavily fragmented. Chemistry Functionalities RDKit, MolVS, RDChiral (Coley, 2018), etc. Chemical Compound and Reaction Datasets USPTO (Lowe, 2012), ChEMBL29 (Gaulton, 2017), ZINC15 (Sterling and Irwin, 2015), etc. Single-step Retrosynthesis Approaches Neuralsym (Segler and Waller, 2018), GLN (Dai, 2020), etc. Evaluation Metrics Top-N, Round-trip, Coverage, Diversity (Schwaller, 2019) #1 … #N How to properly evaluate? -
Slide 8 RetroSynthWAVE Project Project Introduction ▪ The RetroSynthWAVE project encapsulates the following: ✓ Wide array of retrosynthesis helpers and accessory tools… ✓ Aggregation of open-source chemical compound and reaction datasets… ✓ Various tools for the re-implementation of impactful retrosynthesis approaches… ✓ Evaluation metrics for new and existing retrosynthesis approaches…
Slide 11 RetroSynthWAVE: HANA Chemistry Module HANA → Helpers ANd Accessories ▪ Goal: Unify frequent synthesis-related functionalities in a single software package. ▪ Motivation: The current synthesis research landscape is very fragmented. Even though a lot of quality, open-source libraries exist, there is no encompassing solution. ▪ Implementation: Lightweight, easy-to-use utility classes developed as wrappers for all synthesis-related functionalities from popular open-source libraries and repositories.
Slide 14 RetroSynthWAVE: COCORO Dataset Module COCORO → COllection of Chemical COmpound and ReactiOn Data ▪ Goal: Unify available, open-source chemical compound and chemical reaction datasets. ▪ Motivation: Chemical reaction data is very scarce in the current research landscape and unifying and curating all available data sources is necessary for research progress. ▪ Implementation: Easy-to-use PyTorch dataset classes which make downloading, cleaning, curation and featurization trivial for users of any skill level.
Slide 17 RetroSynthWAVE: CO-OP Implementation Module CO-OP → COllection Of Popular Approaches ▪ Goal: Develop easy-to-use tools for the re-implementation of impactful approaches. ▪ Purpose: For this field to advance, it is necessary to share concepts and ideas. Thus, it is vital that all users, regardless of skill, can collaborate on new approaches. ▪ Approach: Easy-to-use collection of frequent (PyTorch and PyTorch Geometric) elements used for constructing retrosynthesis approaches. Over time, it is planned to be expanded with a library of re-implemented approaches as a result of collaborative efforts.
Slide 19 RetroSynthWAVE: REFEREE Evaluation Module REFEREE → Retroactive EFficiency MEtRics Evaluation FramEwork ▪ Goal: Develop a retroactively applicable evaluation framework for single-step retrosynthesis approaches independent of the architecture. ▪ Purpose: No such framework currently exists and the Top-N accuracy metric is flawed. ▪ Approach: Evaluation metric framework utility classes that are directly applicable to the suggestions made by the approach, rather than evaluating the approach itself which is not generalizable in most cases.
Slide 21 Conclusion Conclusion and Future Work ▪ The goal of the RetroSynthWAVE project is to develop a systematic solution for a quick and efficient start in chemical synthesis research. ▪ The project consists out of four different modules that can be used individually as well as a software stack. ▪ It is a way for Elix Inc. to support and promote co-operation topics which are heavily dependent on teamwork as opposed to individualism. Module GitHub Repository Link RetroSynthWAVE https://github.com/elix-tech/retro_synth_wave RetroSynthWAVE: HANA https://github.com/elix-tech/rsw_hana RetroSynthWAVE: COCORO https://github.com/elix-tech/rsw_cocoro RetroSynthWAVE: CO-OP https://github.com/elix-tech/rsw_co-op RetroSynthWAVE: REFEREE https://github.com/elix-tech/rsw_referee
Slide 22 Future Work and Release Schedule Conclusion and Future Work ▪ Currently, the development status for each individual module is as follows: ▪ The currently projected launch date for the alpha version of the first software modules is the end of November. ▪ The currently projected launch date for the full stack is the start of January. Module Development Stage RetroSynthWAVE: HANA Testing RetroSynthWAVE: COCORO Testing RetroSynthWAVE: CO-OP Development RetroSynthWAVE: REFEREE Research and Development