Slide 18
Slide 18 text
A ChEMBL corpus sample can be divided into 5 parts:
1. Physico-Chemical properties: logP, Mol Weight, QED, Num
aromatic rings, Num of valence electrons, labute approximate
surface area, etc.
2. Summary of substructures: number of epoxide rings, number of
esters, number of ether oxygens, etc.
3. Similar molecules: Tanimoto sim > 0.8 (if available)
4. Activities and targets: activity type, pChEMBLvalue, value, target,
protein sequence, description of the target (if available)
5. Molecule: depicted with multiple canonical and non-canonical
SMILES representations
18
ChEMBL Corpus
hydrogen bond donnors: 0, polar surface area: 61.83, num of radical electrons: 0, most Acid
dissociation constants (pKa): 13.58, num of aliphatic heterocycles: 0, num of N or O
(Nitrogens and Oxygens): 5, hydrogen bond acceptors: 5, num of aliphatic rings: 0, labute
approximate surface area (LabuteASA): 140.71, logD: 3.12, num saturated rings: 0, num of
aliphatic carbocycles: 0, molecule type: Small molecule, rule of 3: fail, num heavy atoms: 24,
num of rings: 2, num heteroatoms: 5, molecular formula: C19H20O5, natural product likeness
score: -0.54, full molecular weight: 328.36, molecular species: NEUTRAL, logP: 3.06, num
aromatic heterocycles: 0, molecular weight monoisotopic: 328.1311, standard international
chemical identifier (InChI):
InChI=1S/C19H20O5/c1-22-16-9-6-14(7-10-16)8-11-19(21)24-13-18(20)15-4-3-5-17(12-15)23-2/h3-7,9
-10,12H,8,11,13H2,1-2H3, num aromatic rings: 2, quantitative estimate of drug-likeness (qed):
0.55, full molecular formula: C19H20O5, num lipinski rule of 5 (ro5) violations: 0, num
saturated carbocycles: 0, fraction of SP3 hybridized C atoms: 0.26, num aromatic carbocycles:
2, molecular weight freebase: 328.36, num rotatable bonds: 8, num of NHs or OH: 0, Balaban’s J
value (BalabanJ): 1.78, fragments: 2 benzene rings, 2 carbonyl O, excluding COOH, 1 ketones
excluding diaryl, a,b-unsat. dienones, heteroatom on Calpha, 2 carbonyl O, 1 ketones, 1
esters, 1 aryl methyl sites for hydroxylation, 3 ether oxygens (including phenoxy), 2 methoxy
groups -OCH3, num of valence electrons: 126, activities: activity type: Potency, pChEMBLvalue:
4.9, Potency=12589.3nM, target:
MSQEGDYGRWTISSSDESEEEKPKPDKPSTSSLLCARQGAANEPRYTCSEAQKAAHKRKISPVKFSNTDSVLPPKRQKSGSQED
LGWCLSSSDDELQPEMPQKQAEKVVIKKEKDISAPNDGTAQRTENHGAPACHRLKEEEDEYETSGEGQDIWDMLDKGNPFQFYLTRVSGVKPKY
NSGALHIKDILSPLFGTLVSSAQFNYCFDVDWLVKQYPPEFRKKPILLVHGDKREAKAHLHAQAKPYENISLCQAKLDIAFGTHHTKMMLLLYE
EGLRVVIHTSNLIHADWHQKTQGIWLSPLYPRIADGTHKSGESPTHFKADLISYLMAYNAPSLKEWIDVIHKHDLSETNVYLIGSTPGRFQGSQ
KDNWGHFRLKKLLKDHASSMPNAESWPVVGQFSSVGSLGADESKWLCSEFKESMLTLGKESKTPGKSSVPLYLIYPSVENVRTSLEGYPAGGSL
PYSIQTAEKQNWLHSYFHKWSAETSGRSNAMPHIKTYMRPSPDFSKIAWFLVTSANLSKAAWGALEKNGTQLMIRSYELGVLFLPSAFGLDSFK
VKQKFFAGSQEPMATFPVPYDLPPELYGSKDRPWIWNIPYVKAPDTHGNMWVPS, description: Tyrosyl-DNA
phosphodiesterase 1, SMILES: COC1=CC=C(CCC(=O)OCC(=O)C2=CC=CC(OC)=C2)C=C1,
COc1ccc(CCC(=O)OCC(=O)c2cccc(OC)c2)cc1,
COc1ccc(CCC(=O)OCC(=O)c2cccc(OC)c2)cc1
Actual Sample