Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evaluating Computer-assisted Single-step Retrosynthesis, Elix, CBI 2022

Elix
October 28, 2022

Evaluating Computer-assisted Single-step Retrosynthesis, Elix, CBI 2022

Elix

October 28, 2022
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 25th, 2022
    Evaluating Computer-assisted
    Single-step Retrosynthesis
    How significant are recent improvements?
    Haris Hasić (Elix, Inc. & Tokyo Institute of Technology)
    Takahiro Inoue (Elix, Inc.)
    Tatsuya Okubo (Elix, Inc.)
    Takashi Ishida (Tokyo Institute of Technology)

    View Slide

  2. 1. Introduction
    ■ Chemical Synthesis
    ■ Computer-assisted Retrosynthesis
    ■ Single-step Retrosynthesis
    ■ Research Objective
    2. Experiments & Results
    ■ Experiment Design
    ■ Top-50 Accuracy
    ■ Top-10 Accuracy
    ■ Top-50 Uniqueness and Chemical Validity Rate
    3. Conclusion
    ■ Conclusion
    ■ Future Work
    4. Appendix
    Presentation Contents
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan 2

    View Slide

  3. 3
    Introduction
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan

    View Slide

  4. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Chemical Synthesis
    4
    ■ The artificial execution of chemical reactions to obtain a single or multiple target chemical compounds.
    [1] (2017, Zhang, B. et al.): https://doi.org/10.1515/hc-2017-0152.
    [2] PubChem Chemical Compound Database: https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 13th, 2022.
    [3] Reaxys Chemical Reaction Database: https://www.reaxys.com/. Accessed On: October 13th, 2022.
    N,N-dimethylprop-2-ynamide
    PubChem ID [2]: 11240440
    4-Methylbenzaldehyde
    PubChem ID [2]: 7725
    2-Amino-5-methylpyridine
    PubChem ID [2]: 15348
    Target [1]: Zolpidem
    PubChem ID [2]: 5732
    Commercially available chemical compounds.
    Intermediate chemical compounds.
    Target chemical compounds.
    Yield [3]:
    67.0%
    Yield [3]:
    89.0%

    View Slide

  5. ■ Strategy for planning chemical synthesis by analysing target chemical compounds and chemical reactions in
    reverse relying on computing power rather than human experts.
    5
    Computer-assisted Retrosynthesis
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Commercially available chemical compounds.
    Intermediate chemical compounds.
    Target chemical compounds.
    Synthesis Route
    Planning
    Target [1]: Zolpidem
    PubChem ID [2]: 5732
    Single-step Retrosynthesis

    [1] (2017, Zhang, B. et al.): https://doi.org/10.1515/hc-2017-0152.
    [2] PubChem Chemical Compound Database: https://pubchem.ncbi.nlm.nih.gov/. Accessed On: October 13th, 2022.

    View Slide

  6. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Single-step Retrosynthesis
    6
    ■ A mechanism for suggesting precursor chemical compounds for a given target chemical compound.
    [1] (2017, Liu, B. et al.): https://doi.org/10.1021/acscentsci.7b00303.
    [2] (2017, Segler, M.H.S. and Waller, M.P.): https://doi.org/10.1002/chem.201605499.
    [3] (2020, Shi, C. et al.): https://doi.org/10.48550/arXiv.2003.12725.
    Commercially available chemical compounds.
    Intermediate chemical compounds.
    Target chemical compounds.
    Template-based Approaches [2]
    CN(C)C(=O)CC1=C(N=C2C=CC(C)=CN12)C1=CC=C(C)C=C1
    CC1=CNC(C=C1)N=CC1=CC=C(C)C=C1.CN(C)C(=O)C#C
    Template-free Approaches [1]
    Semi-template-based Approaches [3]

    View Slide

  7. ■ The main research objective is to answer the following questions:
    1. What is the current state-of-the-art single-step retrosynthesis approach?
    2. Are the recent frequent improvements actually significant?
    3. How easy/difficult is it to re-produce the reported results?
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Research Objective
    7
    [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408.
    [2] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038.
    [3] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w.
    [4] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a.
    Experiments on the
    benchmark datasets show
    a significant 8.1%
    improvement over existing
    state-of-the-art methods in
    top-one accuracy. [1]
    Our model achieves a
    top-1 accuracy of 53.7%,
    outperforming previous
    template-free and
    semi-template-based
    methods. [2]
    The overall accuracy with
    singly and doubly mutated
    predictions was 61.6%,
    outperforming current
    state-of-the-art methods. [3]
    We compare the proposed
    R-SMILES with various
    state-of-the-art baselines and
    show that it significantly
    outperforms them all,
    demonstrating the superiority
    of the proposed method. [4]

    View Slide

  8. 8
    Experiments & Results
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan

    View Slide

  9. 9
    Experiment Design
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    * As of July, 2022. ** On the USPTO-MIT dataset.
    [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408.
    [2] (2020, Tetko, I.V. et al.): https://doi.org/10.1038/s41467-020-19266-y.
    [3] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038.
    [4] (2021, Chen, S. and Jung, Y.): https://doi.org/10.1021/jacsau.1c00246.
    [5] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w.
    [6] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a.
    [7] (2017, Coley, C.W. et al.): https://doi.org/10.1021/acscentsci.7b00355.
    [8] (2020, Jin, W. et al.): https://doi.org/10.48550/arXiv.1709.04555.
    [9] (2021, Kearnes, S.M. et al.): https://doi.org/10.1021/jacs.1c09820.
    Step 1. Literature Review * Step 2. Dataset Preparation Step 3. Evaluation
    USPTO-50k
    [7]
    USPTO-MIT
    [8]
    ORD (Non-USPTO)
    [9]
    Top-N Accuracy - Aggregated
    accuracy that reflects probability of
    the ground truth being found within
    the first N suggestions.
    Other single-step retrosynthesis metrics:
    1. MaxFrag Accuracy
    2. Suggestion Duplication Rate
    3. Suggestion Chemical Validity
    4. Round-trip Accuracy
    5. Coverage Rate
    6. Diversity Rate…
    Approach Year Type Top-1 (%)
    GLN [1] 2020 TB 52.5
    GraphRetro
    [3]
    2021 Semi-TB 53.7
    LocalRetro [4] 2021 TB 53.4
    Approach Year Type Top-1 (%)
    AT [2] 2020 TF 53.5
    RetroTRAE [5] 2022 TF 61.6 **
    R-SMILES [6] 2022 TF 56.3

    View Slide

  10. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Top-50 Accuracy
    10
    All of the evaluated
    approaches
    under-delivered on
    the Top-N Accuracy
    reported in the
    original publication.
    Because of the faulty
    performance, the AT
    and RetroTRAE Top-N
    accuracy values are
    considered as not
    representative.
    The reported
    LocalRetro Top-N
    accuracy considered
    isomers as identical
    chemical compounds.

    View Slide

  11. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Top-10 Accuracy
    11
    ■ Even though the improvements are
    statistically significant (0.01 < p < 0.05)
    according to the Wilcoxon Signed-rank
    Test, they are consistently lower than
    reported in the original publications.
    ■ The final ranking of the evaluated
    single-step retrosynthesis approaches in
    terms of Top-1 Accuracy is as follows:
    1. R-SMILES: 53.6 ± 0.558 → -2.7
    2. LocalRetro: 52.6 ± 0.650 → -0.8
    3. GLN: 51.7 ± 0.691 → -0.8
    4. GraphRetro: 51.2 ± 0.583 → -2.5

    View Slide

  12. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Top-50 Uniqueness and Chemical Validity Rate
    12
    The GraphRetro
    approach is limited
    as it consistently
    generates around 10
    suggestions even for
    higher beam sizes.

    View Slide

  13. 13
    Conclusion
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan

    View Slide

  14. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Conclusion
    14
    1. What is the current state-of-the-art for single-step retrosynthesis?
    ■ In the case of template-free approaches and overall, the state-of-the-art approach is R-SMILES.
    ■ In the case of template-based approaches, the state-of-the-art approach is LocalRetro.
    3. How easy/difficult is it to re-produce the reported results?
    ■ Template-free single-step retrosynthesis approaches are significantly more complex to implement and train
    successfully than template-based approaches.
    2. Are the recent improvements actually significant?
    ■ The limited number of samples indicates that the improvements are statistically significant (0.01 < p < 0.05)
    according to the Wilcoxon Signed-rank Test, but consistently lower than reported in the original publications.
    Conclusion: The recently observed incremental improvements represent optimization of incomplete
    ideas rather than significant progress, and the field of computer-assisted single-step retrosynthesis
    remains stagnant, with performances insufficient for a practically standalone product.

    View Slide

  15. 15
    Future Work
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    * As of July, 2022. ** On the USPTO-MIT dataset.
    [1] (2020, Dai, H. et al.): https://doi.org/10.48550/arXiv.2001.01408.
    [2] (2020, Tetko, I.V. et al.): https://doi.org/10.1038/s41467-020-19266-y.
    [3] (2021, Somnath, V.R. et al.): https://doi.org/10.48550/arXiv.2006.07038.
    [4] (2021, Chen, S. and Jung, Y.): https://doi.org/10.1021/jacsau.1c00246.
    [5] (2022, Ucak, U.V. et al.): https://doi.org/10.1038/s41467-022-28857-w.
    [6] (2022, Zhong, Z. et al.): https://doi.org/10.1039/d2sc02763a.
    [7] (2017, Coley, C.W. et al.): https://doi.org/10.1021/acscentsci.7b00355.
    [8] (2020, Jin, W. et al.): https://doi.org/10.48550/arXiv.1709.04555.
    [9] (2021, Kearnes, S.M. et al.): https://doi.org/10.1021/jacs.1c09820.
    Step 1. Literature Review * Step 2. Dataset Preparation Step 3. Evaluation
    USPTO-50k
    [7]
    USPTO-MIT
    [8]
    ORD (Non-USPTO)
    [9]
    Top-N Accuracy - Aggregated
    accuracy that reflects probability of
    the ground truth being found within
    the first N suggestions.
    Other single-step retrosynthesis metrics:
    1. MaxFrag Accuracy
    2. Suggestion Duplication Rate
    3. Suggestion Chemical Validity
    4. Round-trip Accuracy
    5. Coverage Rate
    6. Diversity Rate…
    Approach Year Type Top-1 (%)
    GLN [1] 2020 TB 52.5
    GraphRetro
    [3]
    2021 Semi-TB 53.7
    LocalRetro [4] 2021 TB 53.4
    Approach Year Type Top-1 (%)
    AT [2] 2020 TF 53.5
    RetroTRAE [5] 2022 TF 61.6 **
    R-SMILES [6] 2022 TF 56.3

    View Slide

  16. Elix, Inc. | https://www.elix-inc.com/

    View Slide

  17. 17
    Appendix
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan

    View Slide

  18. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Top-N Accuracy
    18
    ■ The aggregated accuracy that reflects the probability of the ground truth precursor chemical compound
    combination being found within the first N suggestions.
    Suggestion #1
    Suggestion #2
    Suggestion #N
    … …
    Ground Truth?
    Top-1: 100% (1/1)
    Top-2: 100% (1/1)

    Top-N: 100% (1/1)
    Ground Truth?
    Top-1: 0% (0/1)
    Top-2: 0% (0/1)

    Top-N: 100% (1/1)
    Ground Truth?
    Top-1: 0% (0/1)
    Top-2: 100% (1/1)

    Top-N: 100% (1/1)

    View Slide

  19. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan
    Top-50 and Top-10 MaxFrag Accuracy Results
    19
    ■ The final ranking of the evaluated
    single-step retrosynthesis approaches in
    terms of Top-1 MaxFrag Accuracy is as
    follows:
    1. R-SMILES: 58.4 ± 0.555 → -2.6
    2. LocalRetro: 57.5 ± 0.650 → -0.4
    3. GLN: 56.6 ± 0.746 → N/A
    4. GraphRetro: 56.4 ± 0.478 → N/A

    View Slide