Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evaluating Computer-assisted Single-step Retrosynthesis, Elix, CBI 2022

Elix
October 25, 2022

Evaluating Computer-assisted Single-step Retrosynthesis, Elix, CBI 2022

Elix

October 25, 2022
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan |

    October 25th, 2022 Evaluating Computer-assisted Single-step Retrosynthesis How significant are recent improvements? Haris Hasić (Elix, Inc. & Tokyo Institute of Technology) Takahiro Inoue (Elix, Inc.) Tatsuya Okubo (Elix, Inc.) Takashi Ishida (Tokyo Institute of Technology)
  2. 1. Introduction ■ Chemical Synthesis ■ Computer-assisted Retrosynthesis ■ Single-step

    Retrosynthesis ■ Research Objective 2. Experiments & Results ■ Experiment Design ■ Top-50 Accuracy ■ Top-10 Accuracy ■ Top-50 Uniqueness and Chemical Validity Rate 3. Conclusion ■ Conclusion ■ Future Work 4. Appendix Presentation Contents Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan 2
  3. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Chemical

    Synthesis 4 ■ The artificial execution of chemical reactions to obtain a single or multiple target chemical compounds. [1] (2017, Zhang, B. et al.): https://doi.org/10.1515/hc-2017-0152. [2] PubChem Chemical Compound Database: https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 13th, 2022. [3] Reaxys Chemical Reaction Database: https://www.reaxys.com/. Accessed On: October 13th, 2022. N,N-dimethylprop-2-ynamide PubChem ID [2]: 11240440 4-Methylbenzaldehyde PubChem ID [2]: 7725 2-Amino-5-methylpyridine PubChem ID [2]: 15348 Target [1]: Zolpidem PubChem ID [2]: 5732 Commercially available chemical compounds. Intermediate chemical compounds. Target chemical compounds. Yield [3]: 67.0% Yield [3]: 89.0%
  4. ■ Strategy for planning chemical synthesis by analysing target chemical

    compounds and chemical reactions in reverse relying on computing power rather than human experts. 5 Computer-assisted Retrosynthesis Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Commercially available chemical compounds. Intermediate chemical compounds. Target chemical compounds. Synthesis Route Planning Target [1]: Zolpidem PubChem ID [2]: 5732 Single-step Retrosynthesis … [1] (2017, Zhang, B. et al.): https://doi.org/10.1515/hc-2017-0152. [2] PubChem Chemical Compound Database: https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 13th, 2022.
  5. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Single-step

    Retrosynthesis 6 ■ A mechanism for suggesting precursor chemical compounds for a given target chemical compound. [1] (2017, Liu, B. et al.): https://doi.org/10.1021/acscentsci.7b00303. [2] (2017, Segler, M.H.S. and Waller, M.P.): https://doi.org/10.1002/chem.201605499. [3] (2020, Shi, C. et al.): https://doi.org/10.48550/arXiv.2003.12725. Commercially available chemical compounds. Intermediate chemical compounds. Target chemical compounds. Template-based Approaches [2] CN(C)C(=O)CC1=C(N=C2C=CC(C)=CN12)C1=CC=C(C)C=C1 CC1=CNC(C=C1)N=CC1=CC=C(C)C=C1.CN(C)C(=O)C#C Template-free Approaches [1] Semi-template-based Approaches [3]
  6. ■ The main research objective is to answer the following

    questions: 1. What is the current state-of-the-art single-step retrosynthesis approach? 2. Are the recent frequent improvements actually significant? 3. How easy/difficult is it to re-produce the reported results? Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Research Objective 7 [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408. [2] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038. [3] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w. [4] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a. Experiments on the benchmark datasets show a significant 8.1% improvement over existing state-of-the-art methods in top-one accuracy. [1] Our model achieves a top-1 accuracy of 53.7%, outperforming previous template-free and semi-template-based methods. [2] The overall accuracy with singly and doubly mutated predictions was 61.6%, outperforming current state-of-the-art methods. [3] We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method. [4]
  7. 9 Experiment Design Chem-Bio Informatics Society (CBI) Annual Meeting 2022,

    Tokyo, Japan * As of July, 2022. ** On the USPTO-MIT dataset. [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408. [2] (2020, Tetko, I.V. et al.): https://doi.org/10.1038/s41467-020-19266-y. [3] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038. [4] (2021, Chen, S. and Jung, Y.): https://doi.org/10.1021/jacsau.1c00246. [5] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w. [6] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a. [7] (2017, Coley, C.W. et al.): https://doi.org/10.1021/acscentsci.7b00355. [8] (2020, Jin, W. et al.): https://doi.org/10.48550/arXiv.1709.04555. [9] (2021, Kearnes, S.M. et al.): https://doi.org/10.1021/jacs.1c09820. Step 1. Literature Review * Step 2. Dataset Preparation Step 3. Evaluation USPTO-50k [7] USPTO-MIT [8] ORD (Non-USPTO) [9] Top-N Accuracy - Aggregated accuracy that reflects probability of the ground truth being found within the first N suggestions. Other single-step retrosynthesis metrics: 1. MaxFrag Accuracy 2. Suggestion Duplication Rate 3. Suggestion Chemical Validity 4. Round-trip Accuracy 5. Coverage Rate 6. Diversity Rate… Approach Year Type Top-1 (%) GLN [1] 2020 TB 52.5 GraphRetro [3] 2021 Semi-TB 53.7 LocalRetro [4] 2021 TB 53.4 Approach Year Type Top-1 (%) AT [2] 2020 TF 53.5 RetroTRAE [5] 2022 TF 61.6 ** R-SMILES [6] 2022 TF 56.3
  8. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Top-50

    Accuracy 10 All of the evaluated approaches under-delivered on the Top-N Accuracy reported in the original publication. Because of the faulty performance, the AT and RetroTRAE Top-N accuracy values are considered as not representative. The reported LocalRetro Top-N accuracy considered isomers as identical chemical compounds.
  9. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Top-10

    Accuracy 11 ■ Even though the improvements are statistically significant (0.01 < p < 0.05) according to the Wilcoxon Signed-rank Test, they are consistently lower than reported in the original publications. ■ The final ranking of the evaluated single-step retrosynthesis approaches in terms of Top-1 Accuracy is as follows: 1. R-SMILES: 53.6 ± 0.558 → -2.7 2. LocalRetro: 52.6 ± 0.650 → -0.8 3. GLN: 51.7 ± 0.691 → -0.8 4. GraphRetro: 51.2 ± 0.583 → -2.5
  10. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Top-50

    Uniqueness and Chemical Validity Rate 12 The GraphRetro approach is limited as it consistently generates around 10 suggestions even for higher beam sizes.
  11. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Conclusion

    14 1. What is the current state-of-the-art for single-step retrosynthesis? ■ In the case of template-free approaches and overall, the state-of-the-art approach is R-SMILES. ■ In the case of template-based approaches, the state-of-the-art approach is LocalRetro. 3. How easy/difficult is it to re-produce the reported results? ■ Template-free single-step retrosynthesis approaches are significantly more complex to implement and train successfully than template-based approaches. 2. Are the recent improvements actually significant? ■ The limited number of samples indicates that the improvements are statistically significant (0.01 < p < 0.05) according to the Wilcoxon Signed-rank Test, but consistently lower than reported in the original publications. Conclusion: The recently observed incremental improvements represent optimization of incomplete ideas rather than significant progress, and the field of computer-assisted single-step retrosynthesis remains stagnant, with performances insufficient for a practically standalone product.
  12. 15 Future Work Chem-Bio Informatics Society (CBI) Annual Meeting 2022,

    Tokyo, Japan * As of July, 2022. ** On the USPTO-MIT dataset. [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408. [2] (2020, Tetko, I.V. et al.): https://doi.org/10.1038/s41467-020-19266-y. [3] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038. [4] (2021, Chen, S. and Jung, Y.): https://doi.org/10.1021/jacsau.1c00246. [5] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w. [6] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a. [7] (2017, Coley, C.W. et al.): https://doi.org/10.1021/acscentsci.7b00355. [8] (2020, Jin, W. et al.): https://doi.org/10.48550/arXiv.1709.04555. [9] (2021, Kearnes, S.M. et al.): https://doi.org/10.1021/jacs.1c09820. Step 1. Literature Review * Step 2. Dataset Preparation Step 3. Evaluation USPTO-50k [7] USPTO-MIT [8] ORD (Non-USPTO) [9] Top-N Accuracy - Aggregated accuracy that reflects probability of the ground truth being found within the first N suggestions. Other single-step retrosynthesis metrics: 1. MaxFrag Accuracy 2. Suggestion Duplication Rate 3. Suggestion Chemical Validity 4. Round-trip Accuracy 5. Coverage Rate 6. Diversity Rate… Approach Year Type Top-1 (%) GLN [1] 2020 TB 52.5 GraphRetro [3] 2021 Semi-TB 53.7 LocalRetro [4] 2021 TB 53.4 Approach Year Type Top-1 (%) AT [2] 2020 TF 53.5 RetroTRAE [5] 2022 TF 61.6 ** R-SMILES [6] 2022 TF 56.3
  13. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Top-N

    Accuracy 18 ■ The aggregated accuracy that reflects the probability of the ground truth precursor chemical compound combination being found within the first N suggestions. Suggestion #1 Suggestion #2 Suggestion #N … … Ground Truth? Top-1: 100% (1/1) Top-2: 100% (1/1) … Top-N: 100% (1/1) Ground Truth? Top-1: 0% (0/1) Top-2: 0% (0/1) … Top-N: 100% (1/1) Ground Truth? Top-1: 0% (0/1) Top-2: 100% (1/1) … Top-N: 100% (1/1)
  14. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan Top-50

    and Top-10 MaxFrag Accuracy Results 19 ■ The final ranking of the evaluated single-step retrosynthesis approaches in terms of Top-1 MaxFrag Accuracy is as follows: 1. R-SMILES: 58.4 ± 0.555 → -2.6 2. LocalRetro: 57.5 ± 0.650 → -0.4 3. GLN: 56.6 ± 0.746 → N/A 4. GraphRetro: 56.4 ± 0.478 → N/A