Slide 1

Slide 1 text

RetroSynthWAVE: An Open-source Software Platform for Efficient Chemical Synthesis Research Oral Presentation October 27th, 2021 Haris Hasić 1 ,2 ([email protected], [email protected]) Takashi Ishida 1 ([email protected]) 1 Ishida Laboratory, Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 2 Elix Inc., Tokyo, Japan Chem-Bio Informatics Society Annual Meeting 2021, Online

Slide 2

Slide 2 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 2 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction ▪ Chemical Synthesis ▪ Retrosynthesis ▪ Computer-aided Synthesis ▪ Single-step Retrosynthesis ▪ Computer-aided Synthesis Research Landscape ▪ RetroSynthWAVE Project 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Future Work and Conclusion

Slide 3

Slide 3 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 3 Chemical Synthesis Project Introduction ▪ The artificial execution of chemical reactions in order to obtain a single or multiple target chemical compounds. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Chemical Synthesis of Zolpidem (Reaxys** ID: 53518301) Reactant Compounds Target Compound Target: Zolpidem PubChem* ID: 5732

Slide 4

Slide 4 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 4 Retrosynthesis Project Introduction ▪ Strategy for planning chemical synthesis by analysing target compounds and potential chemical reactions in reverse. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Retrosynthesis of Zolpidem (Reaxys** ID: 53518301) Target: Zolpidem PubChem* ID: 5732 1. Where to perform disconnections? 2. Which chemical reaction is assumed? 3. How to determine precursor compounds? Answer: Knowledge and Experience Target Compound Synthon Structures Precursor Compounds

Slide 5

Slide 5 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 5 Computer-aided Synthesis Project Introduction ▪ The two main tasks of computer-aided synthesis are: 1. Synthesis Route Planning – The rule-based procedure for solving complex target compounds. 2. Single-step Retrosynthesis – The disconnection suggestion procedure at each step. Solved (Procurable compounds.) Target Compound (Chemical compound structure with desirable properties.) Unsolved (Non-solvable, non-procurable compounds.) To maximize the success of synthesis route planning, the most important factor is an efficient single-step retrosynthesis procedure.

Slide 6

Slide 6 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 6 Single-step Retrosynthesis Project Introduction ▪ Top-N Accuracy – Aggregated accuracy that reflects the probability of the ground truth precursor compound combination being found within the first N suggestions. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Reaxys** ID: 53518301 Reaction Templates – Collections of sub-graph patterns that specify the relationships of structural changes in the participating compounds. Reaction SMILES – Simplified molecular-input line-entry system adapted for chemical reactions. CC1=CC=C(C=C1)C1=CN2C=C(C)C=CC2=N1 CC1=CC=C(N)N=C1 CC1=CC=C(C=C1)C(=O)CBr Black Box (e.g., Machine Learning Model, Rule-based System, etc.) #1 … #N How to properly evaluate?

Slide 7

Slide 7 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 7 Computer-aided Synthesis Research Landscape Project Introduction ▪ The current research landscape for computer-aided synthesis is heavily fragmented. Chemistry Functionalities RDKit, MolVS, RDChiral (Coley, 2018), etc. Chemical Compound and Reaction Datasets USPTO (Lowe, 2012), ChEMBL29 (Gaulton, 2017), ZINC15 (Sterling and Irwin, 2015), etc. Single-step Retrosynthesis Approaches Neuralsym (Segler and Waller, 2018), GLN (Dai, 2020), etc. Evaluation Metrics Top-N, Round-trip, Coverage, Diversity (Schwaller, 2019) #1 … #N How to properly evaluate? -

Slide 8

Slide 8 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 8 RetroSynthWAVE Project Project Introduction ▪ The RetroSynthWAVE project encapsulates the following: ✓ Wide array of retrosynthesis helpers and accessory tools… ✓ Aggregation of open-source chemical compound and reaction datasets… ✓ Various tools for the re-implementation of impactful retrosynthesis approaches… ✓ Evaluation metrics for new and existing retrosynthesis approaches…

Slide 9

Slide 9 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 9 RetroSynthWAVE Project Project Introduction Chemistry Functionalities RDKit, MolVS, RDChiral (Coley, 2018), etc. Chemical Compound and Reaction Datasets USPTO (Lowe, 2012), ChEMBL29 (Gaulton, 2017), ZINC15 (Sterling and Irwin, 2015), etc. Single-step Retrosynthesis Approaches Neuralsym (Segler and Waller, 2018), GLN (Dai, 2020), etc. Evaluation Metrics Top-N, Round-trip, Coverage, Diversity (Schwaller, 2019) -

Slide 10

Slide 10 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 10 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module ▪ Module Introduction ▪ Application Examples 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work

Slide 11

Slide 11 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 11 RetroSynthWAVE: HANA Chemistry Module HANA → Helpers ANd Accessories ▪ Goal: Unify frequent synthesis-related functionalities in a single software package. ▪ Motivation: The current synthesis research landscape is very fragmented. Even though a lot of quality, open-source libraries exist, there is no encompassing solution. ▪ Implementation: Lightweight, easy-to-use utility classes developed as wrappers for all synthesis-related functionalities from popular open-source libraries and repositories.

Slide 12

Slide 12 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 12 Application Examples Chemistry Module from rsw_hana.chemical_compounds import CompoundVisualizationUtils from rsw_hana.chemical_reactions import ReactionRepresentationUtils, ReactionVisualizationUtils from rsw_hana.chemical_reactions import ReactionCoreUtils, ReactionCoreAnalysisUtils reaction_smiles = "..." reaction_cores = ReactionCoreUtils.get_reaction_core_utils(reaction_smiles) reactants, _, products = ReactionRepresentationUtils.parse_reaction_roles(reaction_smiles) ReactionVisualizationUtils.draw_reaction(reaction_smiles, highlight_atoms=[reaction_cores]) reactant_fragments, product_fragments = ReactionCoreAnalysisUtils.extract_fragments_from_reaction(reaction_smiles) for reactant_fragment in reactant_fragments: CompoundVisualizationUtils.draw_compound(reactant_fragment)

Slide 13

Slide 13 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 13 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module ▪ Module Introduction ▪ Available Datasets 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work

Slide 14

Slide 14 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 14 RetroSynthWAVE: COCORO Dataset Module COCORO → COllection of Chemical COmpound and ReactiOn Data ▪ Goal: Unify available, open-source chemical compound and chemical reaction datasets. ▪ Motivation: Chemical reaction data is very scarce in the current research landscape and unifying and curating all available data sources is necessary for research progress. ▪ Implementation: Easy-to-use PyTorch dataset classes which make downloading, cleaning, curation and featurization trivial for users of any skill level.

Slide 15

Slide 15 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 15 Available Datasets Dataset Module ▪ Currently, the following datasets are available: Dataset Category Dataset Name Supported Versions Row Count Chemical Compounds ChEMBL [v_25, v_26, v_27, v_28, v_29] ~ 2M ZINC15 [v_250k, v_1M, v_10M, v_270M, v_moses, v_raw] ~ 1.5B Chemical Reactions USPTO [v_15k, v_50k, v_mit, v_schneider, v_raw] ~ 3M ORD [v_202110, v_202110] ~ 3M RheaDB [v_118, v_119] ~ 15K (Kraut, 2013) - ~ 500 (Wei, 2016) - ~ 200

Slide 16

Slide 16 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 16 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module ▪ Module Introduction 5. Evaluation Module 6. Conclusion and Future Work

Slide 17

Slide 17 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 17 RetroSynthWAVE: CO-OP Implementation Module CO-OP → COllection Of Popular Approaches ▪ Goal: Develop easy-to-use tools for the re-implementation of impactful approaches. ▪ Purpose: For this field to advance, it is necessary to share concepts and ideas. Thus, it is vital that all users, regardless of skill, can collaborate on new approaches. ▪ Approach: Easy-to-use collection of frequent (PyTorch and PyTorch Geometric) elements used for constructing retrosynthesis approaches. Over time, it is planned to be expanded with a library of re-implemented approaches as a result of collaborative efforts.

Slide 18

Slide 18 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 18 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module ▪ Module Introduction 6. Conclusion and Future Work

Slide 19

Slide 19 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 19 RetroSynthWAVE: REFEREE Evaluation Module REFEREE → Retroactive EFficiency MEtRics Evaluation FramEwork ▪ Goal: Develop a retroactively applicable evaluation framework for single-step retrosynthesis approaches independent of the architecture. ▪ Purpose: No such framework currently exists and the Top-N accuracy metric is flawed. ▪ Approach: Evaluation metric framework utility classes that are directly applicable to the suggestions made by the approach, rather than evaluating the approach itself which is not generalizable in most cases.

Slide 20

Slide 20 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 20 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work ▪ Conclusion ▪ Future Work and Release Schedule

Slide 21

Slide 21 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 21 Conclusion Conclusion and Future Work ▪ The goal of the RetroSynthWAVE project is to develop a systematic solution for a quick and efficient start in chemical synthesis research. ▪ The project consists out of four different modules that can be used individually as well as a software stack. ▪ It is a way for Elix Inc. to support and promote co-operation topics which are heavily dependent on teamwork as opposed to individualism. Module GitHub Repository Link RetroSynthWAVE https://github.com/elix-tech/retro_synth_wave RetroSynthWAVE: HANA https://github.com/elix-tech/rsw_hana RetroSynthWAVE: COCORO https://github.com/elix-tech/rsw_cocoro RetroSynthWAVE: CO-OP https://github.com/elix-tech/rsw_co-op RetroSynthWAVE: REFEREE https://github.com/elix-tech/rsw_referee

Slide 22

Slide 22 text

Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021 Slide 22 Future Work and Release Schedule Conclusion and Future Work ▪ Currently, the development status for each individual module is as follows: ▪ The currently projected launch date for the alpha version of the first software modules is the end of November. ▪ The currently projected launch date for the full stack is the start of January. Module Development Stage RetroSynthWAVE: HANA Testing RetroSynthWAVE: COCORO Testing RetroSynthWAVE: CO-OP Development RetroSynthWAVE: REFEREE Research and Development