Slide 1

Slide 1 text

STRUCTURAL BIOINFORMATICS Barry Grant University of Michigan www.thegrantlab.org 26-Jan-2016 BIOINF 525 http://bioboot.github.io/bioinf525_w16/

Slide 2

Slide 2 text

MODULE OVERVIEW Objective: Provide an introduction to the practice of bioinformatics as well as a practical guide to using common bioinformatics databases and algorithms 1.1. ‣ Introduction to Bioinformatics 1.2. ‣ Sequence Alignment and Database Searching 1.3 ‣ Structural Bioinformatics 1.4 ‣ Genome Informatics: High Throughput Sequencing Applications and Analytical Methods

Slide 3

Slide 3 text

Answers to last weeks homework (19/19): Answers week 2 Muddy Point Assessment (11/19): Responses - “More time to finish the assignment” - “I felt there was too much material to cover in one lab” - “The [NCBI] sites were so slow” - “More time with HMMER would be helpful” - “Very nice lab” WEEK TWO REVIEW

Slide 4

Slide 4 text

Q18: NW DYNAMIC PROGRAMMING Match: +2 Mismatch: -1 Gap: -2 A G T T C 0 -2 -4 -6 -8 -10 A -2 +2 0 -2 -4 -6 T -4 0 +1 +2 0 -2 T -6 -2 -1 +3 +4 +2 G -8 -4 0 +1 +2 +3 C -10 -6 -2 -1 0 +4 A - T T G C | | | | A G T T - C A T T G C | | | A G T T C

Slide 5

Slide 5 text

Check out the “Background Reading” material online: ‣ Achievements & Challenges in Structural Bioinformatics ‣ Protein Structure Prediction ‣ Biomolecular Simulation ‣ Computational Drug Discovery Complete the lecture 1.3 homework questions: http://tinyurl.com/bioinf525-quiz3 THIS WEEK’S HOMEWORK

Slide 6

Slide 6 text

“Bioinformatics is the application of computers to the collection, archiving, organization, and analysis of biological data.” … A hybrid of biology and computer science

Slide 7

Slide 7 text

“Bioinformatics is the application of computers to the collection, archiving, organization, and analysis of biological data.” Bioinformatics is computer aided biology!

Slide 8

Slide 8 text

“Bioinformatics is the application of computers to the collection, archiving, organization, and analysis of biological data.” Bioinformatics is computer aided biology! Goal: Data to Knowledge

Slide 9

Slide 9 text

So what is structural bioinformatics?

Slide 10

Slide 10 text

So what is structural bioinformatics? Aims to characterize and interpret biomolecules and their assembles at the molecular & atomic level … computer aided structural biology!

Slide 11

Slide 11 text

Why should we care?

Slide 12

Slide 12 text

Why should we care? Because biomolecules are “nature’s robots” … and because it is only by coiling into specific 3D structures that they are able to perform their functions

Slide 13

Slide 13 text

BIOINFORMATICS DATA Genomes DNA & RNA sequence Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Gene expression Literature and ontologies DNA & RNA structure

Slide 14

Slide 14 text

STRUCTURAL DATA IS CENTRAL Genomes DNA & RNA sequence DNA & RNA structure Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Gene expression Literature and ontologies

Slide 15

Slide 15 text

STRUCTURAL DATA IS CENTRAL Genomes DNA & RNA sequence DNA & RNA structure Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Gene expression Literature and ontologies Sequence > Structure > Function change color to gray and yellow from black and red?

Slide 16

Slide 16 text

STRUCTURAL DATA IS CENTRAL Genomes DNA & RNA sequence DNA & RNA structure Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Gene expression Literature and ontologies Sequence > Structure > Function E N E R G E T I C S D Y N A M I C S > >

Slide 17

Slide 17 text

• Unfolded chain of amino acid chain • Highly mobile • Inactive • Ordered in a precise 3D arrangment • Stable but dynamic • Active in specific “conformations” • Specific associations & precise reactions Sequence Function Structure

Slide 18

Slide 18 text

In daily life, we use machines
 with functional structure and moving parts

Slide 19

Slide 19 text

Genomics is a great start …. ▪ But a parts list is not enough to understand how a bicycle works

Slide 20

Slide 20 text

… but not the end ▪ We want the full spatiotemporal picture, and an ability to control it ▪ Broad applications, including drug design, medical diagnostics, chemical manufacturing, and energy

Slide 21

Slide 21 text

Extracted from The Inner Life of a Cell by Cellular Visions and Harvard [YouTube link: https://www.youtube.com/watch?v=y-uuk4Pr2i8 ]

Slide 22

Slide 22 text

• Unfolded chain of amino acid chain • Highly mobile • Inactive • Ordered in a precise 3D arrangment • Stable but dynamic • Active in specific “conformations” • Specific associations & precise reactions Sequence Function Structure

Slide 23

Slide 23 text

KEY CONCEPT: ENERGY LANDSCAPE Native Compact, Ordered Unfolded Expanded, Disordered

Slide 24

Slide 24 text

KEY CONCEPT: ENERGY LANDSCAPE Native Compact, Ordered Molten Globule Unfolded Compact, Disordered Expanded, Disordered 0.1 microseconds 1 millisecond Barrier Height Barrier crossing time ~exp(Barrier Height)

Slide 25

Slide 25 text

KEY CONCEPT: ENERGY LANDSCAPE Native State(s) Compact, Ordered Molten Globule State Unfolded State Compact, Disordered Expanded, Disordered 0.1 microseconds 1 millisecond Barrier Height Barrier crossing time ~exp(Barrier Height) Multiple Native Conformations (e.g. ligand bound and unbound)

Slide 26

Slide 26 text

OUTLINE: ‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery

Slide 27

Slide 27 text

OUTLINE: ‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery

Slide 28

Slide 28 text

TRADITIONAL FOCUS PROTEIN, DNA  AND SMALL MOLECULE DATA SETS  WITH MOLECULAR STRUCTURE Protein (PDB) DNA (NDB) Small Molecules (CCDB)

Slide 29

Slide 29 text

Motivation 1: Detailed understanding of molecular interactions Provides an invaluable structural context for conservation and mechanistic analysis leading to functional insight.

Slide 30

Slide 30 text

Motivation 1: Detailed understanding of molecular interactions Computational modeling can provide detailed insight into functional interactions, their regulation and potential consequences of perturbation. Grant et al. PLoS. Comp. Biol. (2010)

Slide 31

Slide 31 text

Motivation 2: Lots of structural data is becoming available 115,306 (1/20/2016) Data from: http://www.rcsb.org/pdb/statistics/ Structural Genomics has contributed to driving down the cost and time required for structural determination

Slide 32

Slide 32 text

Motivation 2: Lots of structural data is becoming available Structural Genomics has contributed to driving down the cost and time required for structural determination purification expression cloning struc. refinement struc. validation annotation publication phasing data collection xtal screening tracing bl xtal mounting crystallization imaging harvesting target selection PDB Image Credit: “Structure determination assembly line” Adam Godzik

Slide 33

Slide 33 text

Motivation 3: Theoretical and computational predictions have been, and continue to be, enormously valuable and influential!

Slide 34

Slide 34 text

SUMMARY OF KEY MOTIVATIONS Sequence > Structure > Function • Structure determines function, so understanding structure helps our understanding of function Structure is more conserved than sequence • Structure allows identification of more distant evolutionary relationships Structure is encoded in sequence • Understanding the determinants of structure allows design and manipulation of proteins for industrial and medical advantage

Slide 35

Slide 35 text

Residue No. Goals: • Analysis • Visualization • Comparison • Prediction • Design Grant et al. JMB. (2007)

Slide 36

Slide 36 text

Goals: • Analysis • Visualization • Comparison • Prediction • Design Scarabelli and Grant. PLoS. Comp. Biol. (2013)

Slide 37

Slide 37 text

Goals: • Analysis • Visualization • Comparison • Prediction • Design Scarabelli and Grant. PLoS. Comp. Biol. (2013)

Slide 38

Slide 38 text

Goals: • Analysis • Visualization • Comparison • Prediction • Design kinesin G-protein myosin Grant et al. unpublished

Slide 39

Slide 39 text

Goals: • Analysis • Visualization • Comparison • Prediction • Design Grant et al. PLoS One (2011, 2012)

Slide 40

Slide 40 text

Goals: • Analysis • Visualization • Comparison • Prediction • Design Grant et al. PLoS Biology (2011)

Slide 41

Slide 41 text

MAJOR RESEARCH AREAS  AND CHALLENGES Include but are not limited to: • Protein classification • Structure prediction from sequence • Binding site detection • Binding prediction and drug design • Modeling molecular motions • Predicting physical properties (stability, binding affinities) • Design of structure and function • etc... With applications to Biology, Medicine, Agriculture and Industry

Slide 42

Slide 42 text

‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery NEXT UP:

Slide 43

Slide 43 text

HIERARCHICAL STRUCTURE OF PROTEINS Primary Secondary Tertiary Quaternary amino acid residues Alpha helix Polypeptide chain Assembled subunits > > > Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 44

Slide 44 text

RECAP: AMINO ACID NOMENCLATURE main chain (backbone) side chain (R group) Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 45

Slide 45 text

AMINO ACIDS CAN BE GROUPED BY THE PHYSIOCHEMICAL PROPERTIES Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 46

Slide 46 text

AMINO ACIDS POLYMERIZE THROUGH PEPTIDE BOND FORMATION side%chains% backbone% Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 47

Slide 47 text

PEPTIDES CAN ADOPT DIFFERENT CONFORMATIONS BY VARYING THEIR  PHI & PSI BACKBONE TORSIONS Peptide%bond%is%planer% (Cα,%C,%O,%N,%H,%Cα%%all% lie%in%the%same%plane) φ" ψ Bond%angles%and%lengths% are%largely%invariant C?terminal N?terminal Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 48

Slide 48 text

PHI VS PSI PLOTS ARE KNOWN AS RAMACHANDRAN DIAGRAMS • Steric%hindrance%dictates%torsion%angle%preference%% • Ramachandran%plot%show%preferred%regions%of%%φ%and%ψ%dihedral% angles%which%correspond%to%major%forms%of%secondary"structure Alpha Helix Beta Sheet Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 49

Slide 49 text

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET Hydrogen%bond:"i→i+4 α4helix" • Most%common%from%has%3.6%residues%per%turn% (number%of%residues%in%one%full%rotation)%%% • Hydrogen%bonds%(dashed%lines)%between% residue%i"and%i+4"stabilize%the%structure% • The%side%chains%(in%green)%protrude%outward% •310 ?helix%and%π?helix%forms%are%less%common Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 50

Slide 50 text

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET In%antiparallel"β4sheets" •Adjacent%β?strands%run%in%opposite%directions%% •Hydrogen%bonds%(dashed%lines)%between%NH%and%CO% stabilize%the%structure% •The%side%chains%(in%green)%are%above%and%below%the%sheet Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 51

Slide 51 text

MAJOR SECONDARY STRUCTURE TYPES ALPHA HELIX & BETA SHEET In%parallel"β4sheets" •Adjacent%β?strands%run%in%same%direction% •Hydrogen%bonds%(dashed%lines)%between%NH%and%CO% stabilize%the%structure% •The%side%chains%(in%green)%are%above%and%below%the%sheet Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/

Slide 52

Slide 52 text

What Does a Protein Look like?

Slide 53

Slide 53 text

• Proteins%are%stable%(and%hidden)%in%water

Slide 54

Slide 54 text

• Proteins%closely%interact%with%water

Slide 55

Slide 55 text

• Proteins%are%close%packed%solid%but%flexible%objects%(globular)

Slide 56

Slide 56 text

• Due%to%their%large%size%and%%complexity%it%is%often% hard%to%see%whats%important%in%the%structure%

Slide 57

Slide 57 text

• Backbone%or%main?chain%representation%can%help% trace%chain%topology%

Slide 58

Slide 58 text

• Backbone%or%main?chain%representation%can%help% trace%chain%topology%&%reveal%secondary%structure

Slide 59

Slide 59 text

• Simplified%secondary%structure%representations%are% commonly%used%to%communicate%structural%details%% • Now%we%can%clearly%see%2o,%3o%and%4o%structure% • Coiled%chain%of%connected%secondary%structures

Slide 60

Slide 60 text

DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY Superposition%of%all%482%structures%in%RCSB%PDB% (23/09/2015)%

Slide 61

Slide 61 text

DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY Principal%component%analysis%(PCA)%of%experimental%structures%

Slide 62

Slide 62 text

KEY CONCEPT: ENERGY LANDSCAPE Native State(s) Compact, Ordered Molten Globule State Unfolded State Compact, Disordered Expanded, Disordered 0.1 microseconds 1 millisecond Barrier Height Barrier crossing time ~exp(Barrier Height) Multiple Native Conformations (e.g. ligand bound and unbound)

Slide 63

Slide 63 text

Key%forces%affec`ng%structure: • H?bonding% • Van%der%Waals% • Electrosta`cs% • Hydrophobicity% • Disulfide%Bridges 150°%<%θ%<%180° d 2.6%Å%<%d%<%3.1Å

Slide 64

Slide 64 text

Repulsion% Airac`on% d 3%Å%<%d%<%4Å Key%forces%affec`ng%structure: • H?bonding% • Van%der%Waals% • Electrosta`cs% • Hydrophobicity% • Disulfide%Bridges

Slide 65

Slide 65 text

• H?bonding% • Van%der%Waals% • Electrosta`cs% • Hydrophobicity% • Disulfide%Bridges (some%`me%called%IONIC%BONDs%or%SALT%BRIDGEs) E = Energy k = constant D = Dielectric constant (vacuum = 1; H2 O = 80) q1 & q2 = electronic charges (Coulombs) r = distance (Å) Coulomb’s"law d%%%%%%%%%%d%=%2.8%Å Key%forces%affec`ng%structure:

Slide 66

Slide 66 text

• H?bonding% • Van%der%Waals% • Electrosta`cs% • Hydrophobicity% • Disulfide%Bridges The%force%that%causes%hydrophobic%molecules%or%nonpolar%por`ons%of%molecules%to% aggregate%together%rather%than%to%dissolve%in%water%is%called%Hydrophobicity%(Greek," “water"fearing”).%This%is%not%a%separate%bonding%force;%rather,%it%is%the%result%of%the% energy%required%to%insert%a%nonpolar%molecule%into%water. Key%forces%affec`ng%structure:

Slide 67

Slide 67 text

Forces%affec`ng%structure: • H?bonding% • Van%der%Waals% • Electrosta`cs% • Hydrophobicity% • Disulfide%Bridges 10 Other%names:% cys`ne%bridge% disulfide%bridge Hair%contains%lots%of%disulfide%bonds% which%are%broken%and%reformed%by%heat

Slide 68

Slide 68 text

‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery NEXT UP:

Slide 69

Slide 69 text

PDB Growing but not as rapidly as Sequence repositories It is highly biased towards crystallography of enzymes

Slide 70

Slide 70 text

Search: HIV

Slide 71

Slide 71 text

Search: 1HSG (PDB ID)

Slide 72

Slide 72 text

Slide Credit: RCSB PDB

Slide 73

Slide 73 text

PDB FILE FORMAT • PDB files contains atomic coordinates and associated information.

Slide 74

Slide 74 text

KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE Two%main%approaches:% (1).%Physics?Based% (2).%Knowledge?Based%

Slide 75

Slide 75 text

Two%main%approaches:% (1).%Physics?Based% (2).%Knowledge?Based% KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE

Slide 76

Slide 76 text

PHYSICS-BASED POTENTIALS
 ENERGY TERMS FROM PHYSICAL THEORY The Potential Energy Function Ubond = oscillations about the equilibrium bond length Uangle = oscillations of 3 atoms about an equilibrium bond angle Udihedral = torsional rotation of 4 atoms about a central bond Unonbond = non-bonded energy terms (electrostatics and Lenard-Jones) CHARMM P.E. function, see: http://www.charmm.org/

Slide 77

Slide 77 text

img044.jpg (400x300x24b jpeg) Slide Credit: Michael Levitt

Slide 78

Slide 78 text

2223234 img054.jpg (400x300x24b jpeg) Slide Credit: Michael Levitt

Slide 79

Slide 79 text

PHYSICS-ORIENTED APPROACHES Weaknesses% Fully%physical%detail%becomes%computa`onally%intractable% Approxima`ons%are%unavoidable% (Quantum%effects%approximated%classically,%water%may%be%treated%crudely)% Parameteriza`on%s`ll%required% Strengths% Interpretable,%provides%guides%to%design% Broadly%applicable,%in%principle%at%least% Clear%pathways%to%improving%accuracy% Status% Useful,%widely%adopted%but%far%from%perfect% % Mul`ple%groups%working%on%fewer,%beier%approxs% Force%fields,%quantum% entropy,%water%effects% Moore’s%law:%hardware%improving

Slide 80

Slide 80 text

–Johnny Appleseed Put Levit’s Slide here on Computer Power Increases!

Slide 81

Slide 81 text

SIDE-NOTE: GPUS AND ANTON SUPERCOMPUTER

Slide 82

Slide 82 text

SIDE-NOTE: GPUS AND ANTON SUPERCOMPUTER

Slide 83

Slide 83 text

Two%main%approaches:% (1).%Physics?Based% (2).%Knowledge?Based% KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE A SYSTEMS ENERGY AS A FUNCTION OF ITS STRUCTURE

Slide 84

Slide 84 text

KNOWLEDGE-BASED DOCKING POTENTIALS His`dine% Ligand
 carboxylate Aroma`c
 stacking

Slide 85

Slide 85 text

Example:%ligand%carboxylate%O%to%protein%his`dine%N% Find%all%protein?ligand%structures%in%the%PDB%with%a%ligand%carboxylate%O% 1. %%For%each%structure,%histogram%the%distances%from%O%to%every%his`dine%N% 2. %%Sum%the%histograms%over%all%structures%to%obtain%p(rO?N )% 3. %%Compute%E(rO?N )%from%p(rO?N ) ENERGY DETERMINES PROBABILITY (STABILITY) Energy Probability x Boltzmann: Inverse%Boltzmann: Basic idea: Use probability as a proxy for energy

Slide 86

Slide 86 text

KNOWLEDGE-BASED DOCKING POTENTIALS A%few%types%of%atom%pairs,%out%of%several%hundred%total Atom?atom%distance%(Angstroms) Nitrogen+/Oxygen? Aroma`c%carbons Alipha`c%carbons “PMF”, Muegge & Martin, J. Med. Chem. (1999) 42:791

Slide 87

Slide 87 text

KNOWLEDGE-BASED POTENTIALS Weaknesses% Accuracy%limited%by%availability%of%data% Strengths% Rela`vely%easy%to%implement% Computa`onally%fast% Status% Useful,%far%from%perfect% % May%be%at%point%of%diminishing%returns% (not%always%clear%how%to%make%improvements)

Slide 88

Slide 88 text

‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery NEXT UP:

Slide 89

Slide 89 text

PREDICTING FUNCTIONAL DYNAMICS • Proteins"are"intrinsically"flexible"molecules"with"internal" moCons"that"are"oDen"inCmately"coupled"to"their" biochemical"funcCon" – E.g.%%ligand%and%substrate%binding,%conforma`onal%ac`va`on,% allosteric%regula`on,%etc.% • Thus"knowledge"of"dynamics"can"provide"a"deeper" understanding"of"the"mapping"of"structure"to"funcCon"" – Molecular"dynamics%(MD)%and%normal"mode"analysis%(NMA)%are% two%major%methods%for%predic`ng%and%characterizing%molecular% mo`ons%and%their%proper`es

Slide 90

Slide 90 text

McCammon, Gelin & Karplus, Nature (1977) [ See: https://www.youtube.com/watch?v=ui1ZysMFcKk ] • Use force-field to find Potential energy between all atom pairs • Move atoms to next state • Repeat to generate trajectory MOLECULAR DYNAMICS SIMULATION

Slide 91

Slide 91 text

Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)" (for%integra`ng%equa`ons%of%mo`on,%see%below) t

Slide 92

Slide 92 text

Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)" (for%integra`ng%equa`ons%of%mo`on,%see%below) At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%% (by%evalua`ng%force4field"gradient) Nucleic motion described classically Empirical force field t

Slide 93

Slide 93 text

Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)" (for%integra`ng%equa`ons%of%mo`on,%see%below) At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%% (by%evalua`ng%force4field"gradient) Nucleic motion described classically Empirical force field Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons% (by%integra`ng%numerically%via%the%“leapfrog”%scheme)" t

Slide 94

Slide 94 text

BASIC ANATOMY OF A MD SIMULATION Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)" (for%integra`ng%equa`ons%of%mo`on,%see%below) At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%% (by%evalua`ng%force4field"gradient) Nucleic motion described classically Empirical force field Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons% (by%integra`ng%numerically%via%the%“leapfrog”%scheme)" REPEAT,""(iterate"many,"many"Cmes…"1ms"="1012"Cme"steps)" t

Slide 95

Slide 95 text

MD%Predic`on%of%Func`onal%Mo`ons% “close” “open” Yao%and%Grant,%Biophys%J.%(2013)

Slide 96

Slide 96 text

Simula`ons%Iden`fy%Key%Residues% Media`ng%Dynamic%Ac`va`on% Yao%…%Grant,%Journal%of%Biological%Chemistry%(2016)

Slide 97

Slide 97 text

EXAMPLE APPLICATION OF MOLECULAR SIMULATIONS TO GPCRS Cell$ membrane Binding GPCR Activation G-protein- coupling G$protein Structure determines function • Example: G protein-coupled receptors (GPCRs) • Largest class of human drug targets • Function: allow the cell to sense and respond to molecules outside it Binding GPCR G protein Cell Membrane

Slide 98

Slide 98 text

PROTEINS JUMP BETWEEN MANY, HIERARCHICALLY ORDERED “CONFORMATIONAL SUBSTATES” t H. Frauenfelder et al., Science 229 (1985) 337

Slide 99

Slide 99 text

MOLECULAR DYNAMICS IS VERY EXPENSIVE %Example:%F1 ?ATPase%in%water%(183,674%atoms)%for%1%nanosecond:%% %%=>%106%integration%steps%% %%=>%8.4%*%1011%floating%point%operations/step%%% %%%%%%%[n(n?1)/2%interactions]% %%%%%%%Total:% 8.4%*%1017%flop% %%%%%%(on%a%100%Gflop/s%cpu:% ca"25"years!)% …"but"performance"has"been"improved"by"use"of:" %%%%%%multiple%time%stepping% % ca.%%2.5%years% %%%%%%fast%multipole%methods%% ca.%%%1%year%% %%%%%%parallel%computers%% % %%%%%%%%ca.%%5%days% modern%GPUs%%% % %%%%%%%%ca.""1"day" (Anton"supercomputer%%%%%%%%%ca.""minutes) Improve this slide

Slide 100

Slide 100 text

• MD%is%s`ll%`me?consuming%for%large%systems% • Elas`c%network%model%NMA%(ENM?NMA)%is%an%example%of%a% lower%resolu`on%approach%that%finishes%in%seconds%even%for% large%systems. Atomis`c C.%G. • 1%bead%/
 1%amino%acid% • Connected%by% springs Coarse%Grained i j r ij COARSE GRAINING: NORMAL MODE ANALYSIS (NMA)

Slide 101

Slide 101 text

NMA models the protein as a network of elastic strings Proteinase K

Slide 102

Slide 102 text

‣ Overview of structural bioinformatics • Major motivations, goals and challenges ‣ Fundamentals of protein structure • Composition, form, forces and dynamics ‣ Representing and interpreting protein structure • Modeling energy as a function of structure ‣ Example application areas • Predicting functional dynamics & drug discovery NEXT UP:

Slide 103

Slide 103 text

THE TRADITIONAL EMPIRICAL PATH TO DRUG DISCOVERY Compound"library
 (commercial,"in4house,
 syntheCc,"natural) High"throughput"screening
 (HTS) Hit"confirmaCon Lead"compounds
 (e.g.,"µM"Kd ) Lead"opCmizaCon" (Medicinal"chemistry) Potent"drug"candidates
 (nM"Kd )" Animal"and"clinical"
 evaluaCon

Slide 104

Slide 104 text

COMPUTER-AIDED LIGAND DESIGN Aims%to%reduce%number%of%compounds%synthesized%and%assayed% Lower%costs% Reduce%chemical%waste% Facilitate%faster%progress

Slide 105

Slide 105 text

Two%main%approaches:% (1).%Receptor/Target?Based" (2).%Ligand/Drug4Based%

Slide 106

Slide 106 text

Two%main%approaches:% (1).%Receptor/Target?Based" (2).%Ligand/Drug4Based%

Slide 107

Slide 107 text

SCENARIO 1: RECEPTOR-BASED DRUG DISCOVERY HIV%Protease/KNI?272%complex Structure%of%Targeted%Protein%Known:%Structure?Based%Drug%Discovery

Slide 108

Slide 108 text

PROTEIN-LIGAND DOCKING VDW Dihedral Screened%Coulombic + 4 Poten`al%func`on
 Energy%as%func`on%of%structure Docking%soware 
 Search%for%structure%of%lowest%energy Structure-Based Ligand Design

Slide 109

Slide 109 text

STRUCTURE-BASED VIRTUAL SCREENING Candidate%ligands Experimental%assay Compound% database 3D"structure"of"target
 (crystallography,%NMR,% modeling) Virtual"screening
 (e.g.,%computaConal"docking) Ligands Ligand%op`miza`on
 Med%chem,%crystallography,% modeling Drug"candidates

Slide 110

Slide 110 text

COMPOUND LIBRARIES Commercial%% (in?house%pharma) Government%(NIH) Academia

Slide 111

Slide 111 text

FRAGMENTAL STRUCTURE-BASED SCREENING “Fragment”"library 3D"structure"of"target
 Fragment"docking Compound%design hip://www.beilstein?ins`tut.de/bozen2002/proceedings/Jho`/jho`.html Experimental%assay%and%ligand%op`miza`on
 Med%chem,%crystallography,%modeling Drug"candidates

Slide 112

Slide 112 text

Small organic probe fragment affinities map multiple potential binding sites across the structural ensemble. Multiple non active-site pockets identified * * GDP GTP Residue No. Probe Occupancy ethanol isopropanol acetone cyclohexane phenol methylamine benzene acetamide

Slide 113

Slide 113 text

Ensemble docking & candidate inhibitor testing 1321N1 U138 U251 U373 U343 Ras-GTP Total Ras Compound effect on U251 cell line Ras activity in different cell lines 3) NCI ligands that target the C1 pocket of K-ras 13616 23895 36818 99660 117028 121182 99660 Top hits from ensemble docking against distal pockets were tested for inhibitory effects on basal ERK activity in glioblastoma cell lines. Ensemble computational docking PLoS One (2011, 2012) DMSO 662796 36818 643000 117028 P-ERK1/2 Total ERK1/2 10 µM Compound testing in cancer cell lines

Slide 114

Slide 114 text

Proteins%and%Ligand%are%Flexible + Ligand Protein Complex ΔGo

Slide 115

Slide 115 text

COMMON SIMPLIFICATIONS USED IN 
 PHYSICS-BASED DOCKING Quantum%effects%approximated%classically% Protein%oen%held%rigid% Configura`onal%entropy%neglected% Influence%of%water%treated%crudely

Slide 116

Slide 116 text

Two%main%approaches:% (1).%Receptor/Target?Based" (2).%Ligand/Drug4Based%

Slide 117

Slide 117 text

e.g.%MAP%Kinase%Inhibitors Using%knowledge%of% exis`ng%inhibitors%to% discover%more Scenario"2% Structure%of%Targeted%Protein%Unknown:%Ligand?Based%Drug%Discovery

Slide 118

Slide 118 text

Why%Look%for%Another%Ligand%if%You%Already%Have%Some? Experimental%screening%generated%some%ligands,%but%they%don’t%bind%`ghtly% A%company%wants%to%work%around%another%company’s%chemical%patents% An%high?affinity%ligand%is%toxic,%is%not%well?absorbed,%etc.

Slide 119

Slide 119 text

LIGAND-BASED VIRTUAL SCREENING Compound%Library Known"Ligands Molecular"similarity" Machine?learning% Etc. Candidate%ligands Assay Ac`ves Op`miza`on
 Med%chem,%crystallography,%% modeling Potent"drug"candidates

Slide 120

Slide 120 text

CHEMICAL SIMILARITY
 LIGAND-BASED DRUG-DISCOVERY Compounds
 (available/synthesizable) Compare%with%known%ligands Different Test%experimentally Sim ilar Don’t%bother

Slide 121

Slide 121 text

CHEMICAL FINGERPRINTS
 BINARY STRUCTURE KEYS Molecule%1 Molecule%2 phenyl m ethyl ketone carboxylate am ide aldehyde chlorine fluorine ethyl naphthyl S?S%bond alc ohol …

Slide 122

Slide 122 text

CHEMICAL SIMILARITY FROM FINGERPRINTS
 NI =2 Intersec`on NU =8 Union Molecule%1 Molecule%2 Tanimoto Similarity or Jaccard Index, T

Slide 123

Slide 123 text

POTENTIAL DRAWBACKS OF PLAIN CHEMICAL SIMILARITY May"miss"good"ligands"by"being"overly"conservaCve" May"put"too"much"weight"on"irrelevant"details" %%?%Examine%ligand%shape%and%common%substructures% %%?%Build%pharmacophore%models% %%?%Sta`s`cs%and%machine%learning%%on%chemical%descriptors

Slide 124

Slide 124 text

Maximum%Common%Substructure Ncommon =34

Slide 125

Slide 125 text

+"1 Bulky"hydrophobe AromaCc 5.0%±0.3%Å 3.2%±0.4%Å 2.8%±0.3%Å Pharmacophore%Models% Φάρμακο%(drug)%+%Φορά%(carry) A%3?point%pharmacophore

Slide 126

Slide 126 text

Molecular%Descriptors
 More%abstract%than%chemical%fingerprints Physical%descriptors% % molecular%weight% % charge% % dipole%moment% % number%of%H?bond%donors/acceptors% % number%of%rotatable%bonds% % hydrophobicity%(log%P%and%clogP)% Topological% % branching%index% % measures%of%linearity%vs%interconnectedness% Etc.%etc. Rotatable%bonds

Slide 127

Slide 127 text

A%High?Dimensional%“Chemical%Space”% Each%compound%is%at%a%point%in%an%n?dimensional%space% Compounds%with%similar%proper`es%are%near%each%other Descriptor%1 Descriptor%2 Descriptor%3 Point%represen`ng%a% compound%in%descriptor% space Apply%mulCvariate"staCsCcs%and%machine"learning%for%descriptor?selec`on.% (e.g.%par`al%least%squares,%support%vector%machines,%random%forest,%etc.)

Slide 128

Slide 128 text

• “Everything"should"be"made"as"simple"as"it"can"be"but"not"simpler”"% A%model%is%never"perfect.%%A%model%that%is%not%quan`ta`vely%accurate%in% every%respect%does%not%preclude%one%from%establishing%results%relevant% to%our%understanding%of%biomolecules%as%long%as%the%biophysics%of%the% model%are%properly%understood%and%explored.%% • CalibraCon"of"the"parameters"is"an"ongoing"and"imperfect"process" Ques`ons%and%hypotheses%should%always%be%designed%such%that%they%do% not%depend%crucially%on%the%precise%numbers%used%for%the%various% parameters.%% • A"computaConal"model"is"rarely"universally"right"or"wrong" A%model%may%be%accurate%in%some%regards,%inaccurate%in%others.%%These% subtle`es%can%only%be%uncovered%by%comparing%to%all%available% experimental%data. CAUTIONARY NOTES

Slide 129

Slide 129 text

• Structural bioinformatics is computer aided structural biology • Described major motivations, goals and challenges of structural bioinformatics • Reviewed the fundamentals of protein structure • Introduced both physics and knowledge based modeling approaches for describing the structure, energetics and dynamics of proteins computationally SUMMARY

Slide 130

Slide 130 text

Ilan Samish et al. Bioinformatics 2015;31:146-150

Slide 131

Slide 131 text

INFORMING SYSTEMS BIOLOGY? Genomes DNA & RNA sequence DNA & RNA structure Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Gene expression Literature and ontologies