Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CREDO: A comprehensive resource for Structural ...

CREDO: A comprehensive resource for Structural Interactomics and Drug Discovery

Presentation about CREDO I gave at Vernalis R&D Ltd, Cambridge, United Kingdom.

Adrian Schreyer

February 22, 2013
Tweet

More Decks by Adrian Schreyer

Other Decks in Science

Transcript

  1. CREDO: A comprehensive resource for Structural Interactomics and Drug Discovery

    Adrian Schreyer Department of Biochemistry, University of Cambridge Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 1 / 46
  2. Outline of the talk 1 Introduction Adrian Schreyer (Department of

    Biochemistry, University of Cambridge) The CREDO Database 2 / 46
  3. Introduction What is CREDO? (Very) brief summary Contains the interactions

    between all molecules found in experimentally-determined biological assemblies Also contains intramolecular interactions of these molecules Contacts are represented as Structural Interaction Fingerprints (SIFts) Contains a sequence-to-structure mapping to integrate protein sequence data External resources are integrated to annotate data in CREDO Complete cheminformatics toolkits (OpenEye, RDKit) Python Application-Programming Interface (API) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 3 / 46
  4. Introduction Database statistics From CREDO release 2013.1.2 86,903 PDB entries

    128,776 biological assemblies 607,505 protein-ligand interactions (not the total number of small molecules) 266,062 protein-protein interfaces, 17,793 protein-nucleic acid grooves 20 carbohydrate chains! 1,166,380,424 contacts Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 4 / 46
  5. Structural interactions Structural Interaction Fingerprints (SIFts) Outline 2 Structural interactions

    Structural Interaction Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 5 / 46
  6. Structural interactions Structural Interaction Fingerprints (SIFts) Structural Interaction Fingerprints (SIFts)

    Atom and contact types Atom types are identified using SMARTS patterns Contact types are assigned based on a combination of atom types and geometrical constraints which have to be fulfilled Charges (ionisation states) are not required to determine ionic contacts Multiple contact types possible but at least one type must be present 12 interatomic interaction types 9 ring-ring interaction geometries 4 ring-atom interaction types Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 6 / 46
  7. Structural interactions Aromatic ring interactions Outline 2 Structural interactions Structural

    Interaction Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 7 / 46
  8. Structural interactions Aromatic ring interactions Aromatic ring interaction geometries Adrian

    Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 8 / 46
  9. Structural interactions Aromatic ring interactions Atom-aromatic ring interactions pi-electrons as

    atom type Delocalised π-electron cloud of aromatic ring systems creates negative charge on both faces Can act as hydrogen bond acceptor and negatively ionisable group Distance- and geometry-dependent Interaction types π-donor: with hydrogen bond donors π-cation: with positively ionisable groups π-carbon: with weak hydrogen bond donors π-halogen: weak hydrogen bonds with halogens in a head-on orientation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 9 / 46
  10. Structural interactions Aromatic ring interactions Pi-donor example from a drug-target

    interaction Human aldose reductase mutant V47I complexed with fidarestat (PDB entry: 2PD9) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 10 / 46
  11. Structural interactions Ligand-ligand interactions Outline 2 Structural interactions Structural Interaction

    Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 11 / 46
  12. Structural interactions Ligand-ligand interactions Inhibition of Quinone Reductase by Imatinib

    The structure of the leukemia drug imatinib bound to human quinone reductase 2 (PDB entry: 3FW1) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 12 / 46
  13. Structural interactions Ligand-ligand interactions Small molecule dimer blocking the p53-MDM2

    interaction Structure of hDM2 with Dimer-Inducing Indolyl Hydantoin RO-2443 (PDB entry: 3VBG) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 13 / 46
  14. Structural interactions Data Validation Outline 2 Structural interactions Structural Interaction

    Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 14 / 46
  15. Structural interactions Data Validation Validation of structural properties Structural properties

    All atomic data is retained (b-factors, occupancies) Boolean flags to identify missing/disordered/clashing residues and atoms Boolean flags to identify non-standard, modified and mutated amino acids Additional properties from mmCIF: resolution, r-factor, r-free, pH Ligand geometry (angles) can be problematic Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 15 / 46
  16. Structural interactions Data Validation Precision of atomic coordinates Diffraction-component precision

    index (DPI) Introduced by Cruickshank to estimate the uncertainty of atomic coordinates obtained by structural refinement of protein diffraction data Introduced to the virtual screening community by Goto Goto’s formula to calculate DPI σ(r, Bavg ) = 2.2N1/2 atoms V 1/2 a N−5/6 obs Rfree Goto’s formula to calculate theoretical DPI limit σ(r, Bavg ) = 0.22(1 + s)1/2V −1/2 m C−5/6Rfreed5/2 min Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 16 / 46
  17. Structural interactions Data Validation Missing regions of PDB residues Visualisation

    of missing regions and a secondary structure fragment (PDB entry: 2P33) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 17 / 46
  18. Protein-ligand interactions Annotation of protein-ligand interactions Outline 3 Protein-ligand interactions

    Annotation of protein-ligand interactions SIFt clustering Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 18 / 46
  19. Protein-ligand interactions Annotation of protein-ligand interactions Annotating protein-ligand interactions Metabolic

    pathways EC information is mapped onto protein chains KEGG data is used to identify metabolites and to link them to enzymes Ligands are labelled as substrate, product or cofactor (of the enzyme) Drug-target interactions Approved drugs are identified as well as all other compounds in the ChEMBL database Biological target information (UniProt) is taken from ChEMBL and DrugBank Drug-target interactions are identified Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 19 / 46
  20. Protein-ligand interactions Annotation of protein-ligand interactions Ligand affinities and efficiencies

    Potency of ligands Obtained from the latest version of the ChEMBL database Identified through a combination of document (PubMed), target (UniProt) and chemistry (UniChem) match Binding activities and ligand efficiencies (pKd, BEI, SEI) are linked to ligands where possible 6,848 unique activities for 6,505 unique ligands (28,943 pairs) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 20 / 46
  21. Protein-ligand interactions SIFt clustering Outline 3 Protein-ligand interactions Annotation of

    protein-ligand interactions SIFt clustering Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 21 / 46
  22. Protein-ligand interactions SIFt clustering Clustering interaction fingerprints Structural properties SIFts

    can be aligned to a given sequence system such as UniProt (or structural alignments) These alignments can be used for hierarchical clustering to compare interactions In CREDO this is done for all ligands that interact with proteins 2D and 3D similarities are calculated for terminal (leaf) nodes (always contain two ligands) Integrated into the website and API, phylogenetic trees can be visualised and browsed dynamically Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 22 / 46
  23. Protein-ligand interactions SIFt clustering The SIFt tree for CDK2 Adrian

    Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 23 / 46
  24. Protein sequences and variations Sequence-to-structure mapping Outline 4 Protein sequences

    and variations Sequence-to-structure mapping Structural variations affecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 24 / 46
  25. Protein sequences and variations Sequence-to-structure mapping Mapping UniProt sequences to

    PDB chains Structure integration with function, taxonomy and sequence (SIFTS) initiative Maps UniProt sequences onto PDB residue sequences Provides further residue level annotation from the IntEnz, GO, Pfam, InterPro, SCOP, CATH and Pubmed databases Used to identify modified or mutated amino acids in protein chains Contains secondary structure information for each residue Transformed into relational format and linked to all residues in CREDO Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 25 / 46
  26. Protein sequences and variations Sequence-to-structure mapping Protein Domains Mapping protein

    domains onto protein chains Protein domain classifications from Pfam, CATH and SCOP are integrated into CREDO Mapped to protein chains, ligand binding sites, protein-protein interfaces etc. Pfam has the largest coverage by far 5,724 unique Pfam domains Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 26 / 46
  27. Protein sequences and variations Sequence-to-structure mapping Secondary structure fragments Implementing

    secondary structure fragments The secondary structure information is used to create continuous fragments of secondary structure elements (SSE) in protein chains New fragment is identified after every change in secondary structure in the sequence of a polypeptide chain Tightly integrated with other CREDO entities Easily possible to get all SSEs interacting with a ligand or across a protein-protein interface Potential application in the context of peptidomimetic drugs and biologics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 27 / 46
  28. Protein sequences and variations Structural variations affecting PDB residues and

    their interactions Outline 4 Protein sequences and variations Sequence-to-structure mapping Structural variations affecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 28 / 46
  29. Protein sequences and variations Structural variations affecting PDB residues and

    their interactions Structural Variations in CREDO Identifying variations in protein structures Mapped onto residues in CREDO through sequence-to-structure mapping Can be easily queried and combined with other parameters Linked to EnsEMBL disease phenotypes 2,369 phenotypes can be linked to residues in CREDO Source databases included in EnsEMBL Variation dbSNP Catalogue Of Somatic Mutations In Cancer (COSMIC) Online Mendelian Inheritance in Man (OMIM) 1000 Genomes Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 29 / 46
  30. Protein sequences and variations Structural variations affecting PDB residues and

    their interactions Relevance: drug resistance in cancer C-KIT tyrosine kinase in complex with Imatinib (PDB entry: 1T46) with T670I Imatinib-resistant mutation. Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 30 / 46
  31. Protein sequences and variations Binding site similarity searching Outline 4

    Protein sequences and variations Sequence-to-structure mapping Structural variations affecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 31 / 46
  32. Protein sequences and variations Binding site similarity searching FuzCav: Binding

    site similarity The FuzCav algorithm Alignment-free and very easy to calculate Based on pharmacophore triplet count to describe a ligand binding site Can detect local similarities between binding sites Performed natively on the server-side with PostgreSQL using numerical extension (pgeigen) Various similarity metrics can be used Calculated for all binding sites in CREDO Journal of Chemical Information and Modeling 2010 50 (1), 123-135 Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 32 / 46
  33. Protein sequences and variations Binding site similarity searching FuzCav: description

    of the algorithm Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 33 / 46
  34. Chemistry and cheminformatics Molecular descriptors Outline 5 Chemistry and cheminformatics

    Molecular descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 34 / 46
  35. Chemistry and cheminformatics Molecular descriptors Calculation of physicochemical properties Conformation-independent

    Important to evaluate drug-likeness and filter molecules Feature counts, tPSA, XLogP, QED, ... Conformation-dependent Calculated for all bound ligands and their up to 200 modelled conformers Solvent-exluded and polar/apolar/total solvent-accessible surface areas Radius of gyration, Number of internal contacts Ultrafast-Shape Recognition (USR) moments as well as USRCAT Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 35 / 46
  36. Chemistry and cheminformatics RECAP fragmentation of chemical components Outline 5

    Chemistry and cheminformatics Molecular descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 36 / 46
  37. Chemistry and cheminformatics RECAP fragmentation of chemical components RECAP fragmention

    of chemical components Implementation of the algorithm The Retrosynthetic Combinatorial Analysis Procedure (RECAP) uses predefined bond types to cleave molecules into fragments A hierarchical and exhaustive fragmentation implementation is used in CREDO Hierarchy stored in the database and linked to chemical components New rules have been implemented to optimise fragmentation of natural products and endogenous compounds Existing rules have been extended (thioethers, thioesters,...) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 37 / 46
  38. Chemistry and cheminformatics RECAP fragmentation of chemical components Standard RECAP

    rules Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 38 / 46
  39. Chemistry and cheminformatics RECAP fragmentation of chemical components RECAP fragments

    and ligands Analysing fragment interactions RECAP fragments are mapped back onto the ligands and their atoms of the original chemical components Therefore it is possible to analyse interactions on the fragment level Fragments can easily be filtered by their interactions, e.g. contact type or interactions with specific amino acids CREDO currently contains two measures to assess the contribution of a fragment to the interaction as a whole Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 39 / 46
  40. Chemistry and cheminformatics RECAP fragmentation of chemical components Fragment Contact

    Density (FCD) New measure to calculate fragment contributions Do all ligand fragments form an equal number of contacts or a single fragment dominate? Ratio between the number of contacts divided by the number of atoms for both the fragment and the whole ligand Number of contacts is simply the number of protein atoms within 4.5Å of the fragment Simple formula to calculate the Fragment Contact Density FCD = NFragment Contacts /NFragment Heavy atoms NLigand Contacts /NLigand Heavy atoms Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 40 / 46
  41. Chemistry and cheminformatics RECAP fragmentation of chemical components Visualisation of

    the FCD Cysteine aspartyl protease-3 (caspase-3) in complex with a non-peptidic inhibitor (PDB entry: 1NMQ) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 41 / 46
  42. Chemistry and cheminformatics Cheminformatics Outline 5 Chemistry and cheminformatics Molecular

    descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 42 / 46
  43. Chemistry and cheminformatics Cheminformatics pgopeneye: database cartridge for cheminformatics Cheminformatics

    extension based on the OpenEye toolkits Implements commonly used cheminformatics routines Substructure, topological similarity, SMARTS, Murcko scaffolds, etc. Supports I/O of SMILES, SDF, OEB, IUPAC Fingerprint similarity metrics use SSE (POPCNT) Fingerprints can be indexed (GIST): 1.2M fingerprints, ordered result in less than 100 ms Very fast MCS search: 6500 structures < 100 ms (great with ChEMBL) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 43 / 46
  44. Chemistry and cheminformatics Cheminformatics USRCAT: real-time USR with pharmacophoric constraints

    USRCAT: an extension of USR USRCAT is an extension of Ultrafast Shape Recognition (USR) that includes pharmacophoric information into the moments Outperforms USR significantly in a virtual screening benchmark (using DUD-E) Implemented natively into the database: can be used in any SQL query (limit to specific family | include chemical graph similarity) Average screening performance of 5.3M conformers (moments) per second (including sorting) Currently used with all PDB chemical components and ZINC drug-like set (12M compounds, 200M+ conformers) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 44 / 46
  45. CREDO in the public domain CREDO Web interface Web interface

    Can be used to browse and search data in CREDO Biological assemblies can be visualised directly, including visualisation of contacts and highlighting of mutations (WebGL) Downloads of selected data sets, e.g. kinases RESTful Web service Most resources of the service can be queried programmaticly through GET or POST requests Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 45 / 46
  46. CREDO in the public domain CREDO on the web More

    information and updates Web interface: http://www-cryst.bioc.cam.ac.uk/credo Blog: http://blog.adrianschreyer.com Twitter: http://twitter.com/credodb Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 46 / 46