CREDO: A comprehensive resource for Structural Interactomics and Drug Discovery

CREDO: A comprehensive resource for Structural Interactomics and Drug Discovery
Adrian Schreyer Department of Biochemistry, University of Cambridge Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 1 / 46

Outline of the talk 1 Introduction Adrian Schreyer (Department of
Biochemistry, University of Cambridge) The CREDO Database 2 / 46

Introduction What is CREDO? (Very) brief summary Contains the interactions
between all molecules found in experimentally-determined biological assemblies Also contains intramolecular interactions of these molecules Contacts are represented as Structural Interaction Fingerprints (SIFts) Contains a sequence-to-structure mapping to integrate protein sequence data External resources are integrated to annotate data in CREDO Complete cheminformatics toolkits (OpenEye, RDKit) Python Application-Programming Interface (API) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 3 / 46

Introduction Database statistics From CREDO release 2013.1.2 86,903 PDB entries
128,776 biological assemblies 607,505 protein-ligand interactions (not the total number of small molecules) 266,062 protein-protein interfaces, 17,793 protein-nucleic acid grooves 20 carbohydrate chains! 1,166,380,424 contacts Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 4 / 46

Structural interactions Structural Interaction Fingerprints (SIFts) Outline 2 Structural interactions
Structural Interaction Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 5 / 46

Structural interactions Structural Interaction Fingerprints (SIFts) Structural Interaction Fingerprints (SIFts)
Atom and contact types Atom types are identiﬁed using SMARTS patterns Contact types are assigned based on a combination of atom types and geometrical constraints which have to be fulﬁlled Charges (ionisation states) are not required to determine ionic contacts Multiple contact types possible but at least one type must be present 12 interatomic interaction types 9 ring-ring interaction geometries 4 ring-atom interaction types Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 6 / 46

Structural interactions Aromatic ring interactions Outline 2 Structural interactions Structural
Interaction Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 7 / 46

Structural interactions Aromatic ring interactions Aromatic ring interaction geometries Adrian
Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 8 / 46

Structural interactions Aromatic ring interactions Atom-aromatic ring interactions pi-electrons as
atom type Delocalised π-electron cloud of aromatic ring systems creates negative charge on both faces Can act as hydrogen bond acceptor and negatively ionisable group Distance- and geometry-dependent Interaction types π-donor: with hydrogen bond donors π-cation: with positively ionisable groups π-carbon: with weak hydrogen bond donors π-halogen: weak hydrogen bonds with halogens in a head-on orientation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 9 / 46

Structural interactions Aromatic ring interactions Pi-donor example from a drug-target
interaction Human aldose reductase mutant V47I complexed with ﬁdarestat (PDB entry: 2PD9) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 10 / 46

Structural interactions Ligand-ligand interactions Outline 2 Structural interactions Structural Interaction
Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 11 / 46

Structural interactions Ligand-ligand interactions Inhibition of Quinone Reductase by Imatinib
The structure of the leukemia drug imatinib bound to human quinone reductase 2 (PDB entry: 3FW1) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 12 / 46

Structural interactions Ligand-ligand interactions Small molecule dimer blocking the p53-MDM2
interaction Structure of hDM2 with Dimer-Inducing Indolyl Hydantoin RO-2443 (PDB entry: 3VBG) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 13 / 46

Structural interactions Data Validation Outline 2 Structural interactions Structural Interaction
Fingerprints (SIFts) Aromatic ring interactions Ligand-ligand interactions Data Validation Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 14 / 46

Structural interactions Data Validation Validation of structural properties Structural properties
All atomic data is retained (b-factors, occupancies) Boolean flags to identify missing/disordered/clashing residues and atoms Boolean flags to identify non-standard, modified and mutated amino acids Additional properties from mmCIF: resolution, r-factor, r-free, pH Ligand geometry (angles) can be problematic Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 15 / 46

Structural interactions Data Validation Precision of atomic coordinates Diffraction-component precision
index (DPI) Introduced by Cruickshank to estimate the uncertainty of atomic coordinates obtained by structural refinement of protein diffraction data Introduced to the virtual screening community by Goto Goto’s formula to calculate DPI σ(r, Bavg ) = 2.2N1/2 atoms V 1/2 a N−5/6 obs Rfree Goto’s formula to calculate theoretical DPI limit σ(r, Bavg ) = 0.22(1 + s)1/2V −1/2 m C−5/6Rfreed5/2 min Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 16 / 46

Structural interactions Data Validation Missing regions of PDB residues Visualisation
of missing regions and a secondary structure fragment (PDB entry: 2P33) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 17 / 46

Protein-ligand interactions Annotation of protein-ligand interactions Outline 3 Protein-ligand interactions
Annotation of protein-ligand interactions SIFt clustering Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 18 / 46

Protein-ligand interactions Annotation of protein-ligand interactions Annotating protein-ligand interactions Metabolic
pathways EC information is mapped onto protein chains KEGG data is used to identify metabolites and to link them to enzymes Ligands are labelled as substrate, product or cofactor (of the enzyme) Drug-target interactions Approved drugs are identiﬁed as well as all other compounds in the ChEMBL database Biological target information (UniProt) is taken from ChEMBL and DrugBank Drug-target interactions are identiﬁed Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 19 / 46

Protein-ligand interactions Annotation of protein-ligand interactions Ligand affinities and efficiencies
Potency of ligands Obtained from the latest version of the ChEMBL database Identified through a combination of document (PubMed), target (UniProt) and chemistry (UniChem) match Binding activities and ligand efficiencies (pKd, BEI, SEI) are linked to ligands where possible 6,848 unique activities for 6,505 unique ligands (28,943 pairs) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 20 / 46

Protein-ligand interactions SIFt clustering Outline 3 Protein-ligand interactions Annotation of
protein-ligand interactions SIFt clustering Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 21 / 46

Protein-ligand interactions SIFt clustering Clustering interaction ﬁngerprints Structural properties SIFts
can be aligned to a given sequence system such as UniProt (or structural alignments) These alignments can be used for hierarchical clustering to compare interactions In CREDO this is done for all ligands that interact with proteins 2D and 3D similarities are calculated for terminal (leaf) nodes (always contain two ligands) Integrated into the website and API, phylogenetic trees can be visualised and browsed dynamically Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 22 / 46

Protein-ligand interactions SIFt clustering The SIFt tree for CDK2 Adrian
Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 23 / 46

Protein sequences and variations Sequence-to-structure mapping Outline 4 Protein sequences
and variations Sequence-to-structure mapping Structural variations aﬀecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 24 / 46

Protein sequences and variations Sequence-to-structure mapping Mapping UniProt sequences to
PDB chains Structure integration with function, taxonomy and sequence (SIFTS) initiative Maps UniProt sequences onto PDB residue sequences Provides further residue level annotation from the IntEnz, GO, Pfam, InterPro, SCOP, CATH and Pubmed databases Used to identify modiﬁed or mutated amino acids in protein chains Contains secondary structure information for each residue Transformed into relational format and linked to all residues in CREDO Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 25 / 46

Protein sequences and variations Sequence-to-structure mapping Protein Domains Mapping protein
domains onto protein chains Protein domain classiﬁcations from Pfam, CATH and SCOP are integrated into CREDO Mapped to protein chains, ligand binding sites, protein-protein interfaces etc. Pfam has the largest coverage by far 5,724 unique Pfam domains Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 26 / 46

Protein sequences and variations Sequence-to-structure mapping Secondary structure fragments Implementing
secondary structure fragments The secondary structure information is used to create continuous fragments of secondary structure elements (SSE) in protein chains New fragment is identiﬁed after every change in secondary structure in the sequence of a polypeptide chain Tightly integrated with other CREDO entities Easily possible to get all SSEs interacting with a ligand or across a protein-protein interface Potential application in the context of peptidomimetic drugs and biologics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 27 / 46

Protein sequences and variations Structural variations aﬀecting PDB residues and
their interactions Outline 4 Protein sequences and variations Sequence-to-structure mapping Structural variations aﬀecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 28 / 46

their interactions Structural Variations in CREDO Identifying variations in protein structures Mapped onto residues in CREDO through sequence-to-structure mapping Can be easily queried and combined with other parameters Linked to EnsEMBL disease phenotypes 2,369 phenotypes can be linked to residues in CREDO Source databases included in EnsEMBL Variation dbSNP Catalogue Of Somatic Mutations In Cancer (COSMIC) Online Mendelian Inheritance in Man (OMIM) 1000 Genomes Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 29 / 46

their interactions Relevance: drug resistance in cancer C-KIT tyrosine kinase in complex with Imatinib (PDB entry: 1T46) with T670I Imatinib-resistant mutation. Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 30 / 46

Protein sequences and variations Binding site similarity searching Outline 4
Protein sequences and variations Sequence-to-structure mapping Structural variations aﬀecting PDB residues and their interactions Binding site similarity searching Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 31 / 46

Protein sequences and variations Binding site similarity searching FuzCav: Binding
site similarity The FuzCav algorithm Alignment-free and very easy to calculate Based on pharmacophore triplet count to describe a ligand binding site Can detect local similarities between binding sites Performed natively on the server-side with PostgreSQL using numerical extension (pgeigen) Various similarity metrics can be used Calculated for all binding sites in CREDO Journal of Chemical Information and Modeling 2010 50 (1), 123-135 Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 32 / 46

Protein sequences and variations Binding site similarity searching FuzCav: description
of the algorithm Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 33 / 46

Chemistry and cheminformatics Molecular descriptors Outline 5 Chemistry and cheminformatics
Molecular descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 34 / 46

Chemistry and cheminformatics Molecular descriptors Calculation of physicochemical properties Conformation-independent
Important to evaluate drug-likeness and ﬁlter molecules Feature counts, tPSA, XLogP, QED, ... Conformation-dependent Calculated for all bound ligands and their up to 200 modelled conformers Solvent-exluded and polar/apolar/total solvent-accessible surface areas Radius of gyration, Number of internal contacts Ultrafast-Shape Recognition (USR) moments as well as USRCAT Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 35 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components Outline 5
Chemistry and cheminformatics Molecular descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 36 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components RECAP fragmention
of chemical components Implementation of the algorithm The Retrosynthetic Combinatorial Analysis Procedure (RECAP) uses predeﬁned bond types to cleave molecules into fragments A hierarchical and exhaustive fragmentation implementation is used in CREDO Hierarchy stored in the database and linked to chemical components New rules have been implemented to optimise fragmentation of natural products and endogenous compounds Existing rules have been extended (thioethers, thioesters,...) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 37 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components Standard RECAP
rules Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 38 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components RECAP fragments
and ligands Analysing fragment interactions RECAP fragments are mapped back onto the ligands and their atoms of the original chemical components Therefore it is possible to analyse interactions on the fragment level Fragments can easily be ﬁltered by their interactions, e.g. contact type or interactions with speciﬁc amino acids CREDO currently contains two measures to assess the contribution of a fragment to the interaction as a whole Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 39 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components Fragment Contact
Density (FCD) New measure to calculate fragment contributions Do all ligand fragments form an equal number of contacts or a single fragment dominate? Ratio between the number of contacts divided by the number of atoms for both the fragment and the whole ligand Number of contacts is simply the number of protein atoms within 4.5Å of the fragment Simple formula to calculate the Fragment Contact Density FCD = NFragment Contacts /NFragment Heavy atoms NLigand Contacts /NLigand Heavy atoms Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 40 / 46

Chemistry and cheminformatics RECAP fragmentation of chemical components Visualisation of
the FCD Cysteine aspartyl protease-3 (caspase-3) in complex with a non-peptidic inhibitor (PDB entry: 1NMQ) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 41 / 46

Chemistry and cheminformatics Cheminformatics Outline 5 Chemistry and cheminformatics Molecular
descriptors RECAP fragmentation of chemical components Cheminformatics Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 42 / 46

Chemistry and cheminformatics Cheminformatics pgopeneye: database cartridge for cheminformatics Cheminformatics
extension based on the OpenEye toolkits Implements commonly used cheminformatics routines Substructure, topological similarity, SMARTS, Murcko scaﬀolds, etc. Supports I/O of SMILES, SDF, OEB, IUPAC Fingerprint similarity metrics use SSE (POPCNT) Fingerprints can be indexed (GIST): 1.2M ﬁngerprints, ordered result in less than 100 ms Very fast MCS search: 6500 structures < 100 ms (great with ChEMBL) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 43 / 46

Chemistry and cheminformatics Cheminformatics USRCAT: real-time USR with pharmacophoric constraints
USRCAT: an extension of USR USRCAT is an extension of Ultrafast Shape Recognition (USR) that includes pharmacophoric information into the moments Outperforms USR signiﬁcantly in a virtual screening benchmark (using DUD-E) Implemented natively into the database: can be used in any SQL query (limit to speciﬁc family | include chemical graph similarity) Average screening performance of 5.3M conformers (moments) per second (including sorting) Currently used with all PDB chemical components and ZINC drug-like set (12M compounds, 200M+ conformers) Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 44 / 46

CREDO in the public domain CREDO Web interface Web interface
Can be used to browse and search data in CREDO Biological assemblies can be visualised directly, including visualisation of contacts and highlighting of mutations (WebGL) Downloads of selected data sets, e.g. kinases RESTful Web service Most resources of the service can be queried programmaticly through GET or POST requests Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 45 / 46

CREDO in the public domain CREDO on the web More
information and updates Web interface: http://www-cryst.bioc.cam.ac.uk/credo Blog: http://blog.adrianschreyer.com Twitter: http://twitter.com/credodb Adrian Schreyer (Department of Biochemistry, University of Cambridge) The CREDO Database 46 / 46

CREDO: A comprehensive resource for Structural ...

CREDO: A comprehensive resource for Structural Interactomics and Drug Discovery

More Decks by Adrian Schreyer

Other Decks in Science

Featured

Transcript