Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RetroSynth WAVE: An Open source Software Platfo...

Elix
October 27, 2021

RetroSynth WAVE: An Open source Software Platform for Efficient Chemical Synthesis Research, Tokyo Institute of Technology, Elix, CBI 2021

Elix

October 27, 2021
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. RetroSynthWAVE: An Open-source Software Platform for Efficient Chemical Synthesis Research

    Oral Presentation October 27th, 2021 Haris Hasić 1 ,2 ([email protected], [email protected]) Takashi Ishida 1 ([email protected]) 1 Ishida Laboratory, Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan 2 Elix Inc., Tokyo, Japan Chem-Bio Informatics Society Annual Meeting 2021, Online
  2. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 2 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction ▪ Chemical Synthesis ▪ Retrosynthesis ▪ Computer-aided Synthesis ▪ Single-step Retrosynthesis ▪ Computer-aided Synthesis Research Landscape ▪ RetroSynthWAVE Project 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Future Work and Conclusion
  3. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 3 Chemical Synthesis Project Introduction ▪ The artificial execution of chemical reactions in order to obtain a single or multiple target chemical compounds. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Chemical Synthesis of Zolpidem (Reaxys** ID: 53518301) Reactant Compounds Target Compound Target: Zolpidem PubChem* ID: 5732
  4. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 4 Retrosynthesis Project Introduction ▪ Strategy for planning chemical synthesis by analysing target compounds and potential chemical reactions in reverse. * PubChem Chemical Compound Database. https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 8th, 2021. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Retrosynthesis of Zolpidem (Reaxys** ID: 53518301) Target: Zolpidem PubChem* ID: 5732 1. Where to perform disconnections? 2. Which chemical reaction is assumed? 3. How to determine precursor compounds? Answer: Knowledge and Experience Target Compound Synthon Structures Precursor Compounds
  5. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 5 Computer-aided Synthesis Project Introduction ▪ The two main tasks of computer-aided synthesis are: 1. Synthesis Route Planning – The rule-based procedure for solving complex target compounds. 2. Single-step Retrosynthesis – The disconnection suggestion procedure at each step. Solved (Procurable compounds.) Target Compound (Chemical compound structure with desirable properties.) Unsolved (Non-solvable, non-procurable compounds.) To maximize the success of synthesis route planning, the most important factor is an efficient single-step retrosynthesis procedure.
  6. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 6 Single-step Retrosynthesis Project Introduction ▪ Top-N Accuracy – Aggregated accuracy that reflects the probability of the ground truth precursor compound combination being found within the first N suggestions. ** Reaxys Chemical Reaction Database. https://www.reaxys.com/. Accessed On: October 8th, 2021. Reaxys** ID: 53518301 Reaction Templates – Collections of sub-graph patterns that specify the relationships of structural changes in the participating compounds. Reaction SMILES – Simplified molecular-input line-entry system adapted for chemical reactions. CC1=CC=C(C=C1)C1=CN2C=C(C)C=CC2=N1 CC1=CC=C(N)N=C1 CC1=CC=C(C=C1)C(=O)CBr Black Box (e.g., Machine Learning Model, Rule-based System, etc.) #1 … #N How to properly evaluate?
  7. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 7 Computer-aided Synthesis Research Landscape Project Introduction ▪ The current research landscape for computer-aided synthesis is heavily fragmented. Chemistry Functionalities RDKit, MolVS, RDChiral (Coley, 2018), etc. Chemical Compound and Reaction Datasets USPTO (Lowe, 2012), ChEMBL29 (Gaulton, 2017), ZINC15 (Sterling and Irwin, 2015), etc. Single-step Retrosynthesis Approaches Neuralsym (Segler and Waller, 2018), GLN (Dai, 2020), etc. Evaluation Metrics Top-N, Round-trip, Coverage, Diversity (Schwaller, 2019) #1 … #N How to properly evaluate? -
  8. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 8 RetroSynthWAVE Project Project Introduction ▪ The RetroSynthWAVE project encapsulates the following: ✓ Wide array of retrosynthesis helpers and accessory tools… ✓ Aggregation of open-source chemical compound and reaction datasets… ✓ Various tools for the re-implementation of impactful retrosynthesis approaches… ✓ Evaluation metrics for new and existing retrosynthesis approaches…
  9. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 9 RetroSynthWAVE Project Project Introduction Chemistry Functionalities RDKit, MolVS, RDChiral (Coley, 2018), etc. Chemical Compound and Reaction Datasets USPTO (Lowe, 2012), ChEMBL29 (Gaulton, 2017), ZINC15 (Sterling and Irwin, 2015), etc. Single-step Retrosynthesis Approaches Neuralsym (Segler and Waller, 2018), GLN (Dai, 2020), etc. Evaluation Metrics Top-N, Round-trip, Coverage, Diversity (Schwaller, 2019) -
  10. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 10 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module ▪ Module Introduction ▪ Application Examples 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work
  11. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 11 RetroSynthWAVE: HANA Chemistry Module HANA → Helpers ANd Accessories ▪ Goal: Unify frequent synthesis-related functionalities in a single software package. ▪ Motivation: The current synthesis research landscape is very fragmented. Even though a lot of quality, open-source libraries exist, there is no encompassing solution. ▪ Implementation: Lightweight, easy-to-use utility classes developed as wrappers for all synthesis-related functionalities from popular open-source libraries and repositories.
  12. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 12 Application Examples Chemistry Module from rsw_hana.chemical_compounds import CompoundVisualizationUtils from rsw_hana.chemical_reactions import ReactionRepresentationUtils, ReactionVisualizationUtils from rsw_hana.chemical_reactions import ReactionCoreUtils, ReactionCoreAnalysisUtils reaction_smiles = "..." reaction_cores = ReactionCoreUtils.get_reaction_core_utils(reaction_smiles) reactants, _, products = ReactionRepresentationUtils.parse_reaction_roles(reaction_smiles) ReactionVisualizationUtils.draw_reaction(reaction_smiles, highlight_atoms=[reaction_cores]) reactant_fragments, product_fragments = ReactionCoreAnalysisUtils.extract_fragments_from_reaction(reaction_smiles) for reactant_fragment in reactant_fragments: CompoundVisualizationUtils.draw_compound(reactant_fragment)
  13. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 13 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module ▪ Module Introduction ▪ Available Datasets 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work
  14. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 14 RetroSynthWAVE: COCORO Dataset Module COCORO → COllection of Chemical COmpound and ReactiOn Data ▪ Goal: Unify available, open-source chemical compound and chemical reaction datasets. ▪ Motivation: Chemical reaction data is very scarce in the current research landscape and unifying and curating all available data sources is necessary for research progress. ▪ Implementation: Easy-to-use PyTorch dataset classes which make downloading, cleaning, curation and featurization trivial for users of any skill level.
  15. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 15 Available Datasets Dataset Module ▪ Currently, the following datasets are available: Dataset Category Dataset Name Supported Versions Row Count Chemical Compounds ChEMBL [v_25, v_26, v_27, v_28, v_29] ~ 2M ZINC15 [v_250k, v_1M, v_10M, v_270M, v_moses, v_raw] ~ 1.5B Chemical Reactions USPTO [v_15k, v_50k, v_mit, v_schneider, v_raw] ~ 3M ORD [v_202110, v_202110] ~ 3M RheaDB [v_118, v_119] ~ 15K (Kraut, 2013) - ~ 500 (Wei, 2016) - ~ 200
  16. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 16 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module ▪ Module Introduction 5. Evaluation Module 6. Conclusion and Future Work
  17. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 17 RetroSynthWAVE: CO-OP Implementation Module CO-OP → COllection Of Popular Approaches ▪ Goal: Develop easy-to-use tools for the re-implementation of impactful approaches. ▪ Purpose: For this field to advance, it is necessary to share concepts and ideas. Thus, it is vital that all users, regardless of skill, can collaborate on new approaches. ▪ Approach: Easy-to-use collection of frequent (PyTorch and PyTorch Geometric) elements used for constructing retrosynthesis approaches. Over time, it is planned to be expanded with a library of re-implemented approaches as a result of collaborative efforts.
  18. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 18 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module ▪ Module Introduction 6. Conclusion and Future Work
  19. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 19 RetroSynthWAVE: REFEREE Evaluation Module REFEREE → Retroactive EFficiency MEtRics Evaluation FramEwork ▪ Goal: Develop a retroactively applicable evaluation framework for single-step retrosynthesis approaches independent of the architecture. ▪ Purpose: No such framework currently exists and the Top-N accuracy metric is flawed. ▪ Approach: Evaluation metric framework utility classes that are directly applicable to the suggestions made by the approach, rather than evaluating the approach itself which is not generalizable in most cases.
  20. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 20 Presentation Contents CBI 2021 – Oral Presentation 1. Project Introduction 2. Chemistry Module 3. Dataset Module 4. Implementation Module 5. Evaluation Module 6. Conclusion and Future Work ▪ Conclusion ▪ Future Work and Release Schedule
  21. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 21 Conclusion Conclusion and Future Work ▪ The goal of the RetroSynthWAVE project is to develop a systematic solution for a quick and efficient start in chemical synthesis research. ▪ The project consists out of four different modules that can be used individually as well as a software stack. ▪ It is a way for Elix Inc. to support and promote co-operation topics which are heavily dependent on teamwork as opposed to individualism. Module GitHub Repository Link RetroSynthWAVE https://github.com/elix-tech/retro_synth_wave RetroSynthWAVE: HANA https://github.com/elix-tech/rsw_hana RetroSynthWAVE: COCORO https://github.com/elix-tech/rsw_cocoro RetroSynthWAVE: CO-OP https://github.com/elix-tech/rsw_co-op RetroSynthWAVE: REFEREE https://github.com/elix-tech/rsw_referee
  22. Chem-Bio Informatics Society Annual Meeting 2021, Online 27 October, 2021

    Slide 22 Future Work and Release Schedule Conclusion and Future Work ▪ Currently, the development status for each individual module is as follows: ▪ The currently projected launch date for the alpha version of the first software modules is the end of November. ▪ The currently projected launch date for the full stack is the start of January. Module Development Stage RetroSynthWAVE: HANA Testing RetroSynthWAVE: COCORO Testing RetroSynthWAVE: CO-OP Development RetroSynthWAVE: REFEREE Research and Development