Dealing with organometallic molecules in RDKit

1b56a2e51fc81e3e92bdc3c412441af8?s=47 Jan Jensen
October 05, 2020

Dealing with organometallic molecules in RDKit

My talk the at 2020 RDKit user group meeting 2020.10.06

1b56a2e51fc81e3e92bdc3c412441af8?s=128

Jan Jensen

October 05, 2020
Tweet

Transcript

  1. Dealing with organometallic molecules in RDKit Jan H. Jensen Department

    of Chemistry, University of Copenhagen @janhjensen 1 2020 RDKit UGM 2020.10.06
  2. Most homogeneous catalysts are organometallic compounds Large datasets are becoming

    available but in xyz format Most cheminformatics/ML relies on SMILES/graphs (e.g. substructure searching and graph convolution) chemrxiv.12894818.v1
  3. J Cheminform (2018) Fe H2 O H2 O HO OH2

    OH H2 O [OH2][Fe](O)(O)([OH2])([OH2]) [OH2] Not readable by RDKIT Charge from SMILES often incorrect
  4. xyz2mol for organic compounds xyz2mol N N N Cl Cl

    O O O Si N H O N N N Cl Cl O O O Si N H O xyz2mol converts an xyz file to an RDKit mol object (needs the molecular charge and hydrogens) github.com/jensengroup/xyz2mol
  5. Organic examples X H H H H H H H

    H * * [4, 4, 1, 1] H H H H valence N H H H O O N H H H O O N H H H O- O- [2, 4, 1, …] N H H H O O * * N H H H O O N+ H H H O O- [1, 3, 1, …]
  6. Sometimes there are more than one solution xyz2mol will generate

    one of them arbitrarily Solution(?): generate all, then filter Generate all using rdchem.ResonanceMolSupplier* Create filter that picks “canonical” form *HT Mads Koerstz
  7. One approach for organometallics Distinguishing dative from covalent bonds Fe

    H2 O H2 O HO OH2 OH H2 O Fe OH H2 O HO OH2 OH H2 O Fe+2 Fe+3 Formal charge on Fe = total charge [e.g. Fe(OH)2 +] O->[Fe](O)(O)(<-O)(<-O)<-O O->[Fe](O)(O)(O)(<-O)<-O
  8. Main problem Distinguishing dative from covalent bonds before bond orders

    are assigned Fe N P H2 N P H2 C O OH Me Fe N P H2 N P H2 C O O Me DFT: 1.97Å DFT: 1.94Å
  9. Another approach Only dative bonds Fe+2 Fe+3 Formal charge in

    Fe = total charge + ∑ charge on ligands Also not sensitive to presence of bond(s) Fe+2 H2 O H2 O -HO OH2 OH- H2 O Fe+3 OH- H2 O -HO OH2 OH- H2 O Alex Clark J. Chem. Inf. Modeling 2011
  10. Must know charge on Fe Fe N- P H2 N-

    P H2 C O OH Me Fe N- P H2 N P H2 C O O- Me total charge = charge on Fe + ∑ charge on ligands
  11. Most TMs have many different oxidation states https://byjus.com/chemistry/transition-elements-oxidation-states/

  12. Try all charges and save cases for which total charge

    = charge on TM + ∑ charge on ligands W+4 N- C H - C- O C- W+6 N-2 C H - C- O- C- W+4 N- CH- C- O -H2 C Paper Expect (W+4) Get W+2 N C H - C- O+ C- W+4 W+2 W+6 Missing bond to W “wrong” resonance form of ligand
  13. Some issues Not all bonds to TM are found (uses

    RDKit Hückel reduced overlap population) Other resonance forms of ligands Hydrides
  14. Dative bond limits ResonanceMolSupplier Fragment -> Resonance structures -> combine(?)

  15. Hydrides and SMILES (RDKit 2020.03.5)

  16. Hydrides and SMILES (RDKit 2020.03.5) SMILES does not give mol

    object with correct charge [HH2-]->[Fe+2] [FeH+2]or [H][Fe+2] becomes
  17. Hydrides and SMILES (RDKit 2020.03.5) Greg found a workaround Another

    option is to treat TM-hydride bonds as covalent and reduce TM charge [FeH+1] [HH2-]->[Fe+2] instead of
  18. Summary Prototype generates RDKit readable SMILES for organometallic compounds w/o

    human intervention But … Not all bonds to TM are found (uses RDKit Hückel reduced overlap population) Non-unique oxidation states/resonance forms (filter/”canonicalization”?) Hydrides charge bug for MolFromSmiles How to automatically test code?
  19. Fe N- P H2 N- P H2 C O OH

    Me Additional RDKit issues Depiction of octahedral compounds not helpful Embedding/UFF optimization not working
  20. https://github.com/jensengroup/xyz2mol/tree/tm_comb Experimental branch can be found here Feedback welcome Additional

    issues continued Specifying stereochemistry https://www.simolecule.com /cdkdepict/depict.html