Quantifying the influence of conformational uncertainty in biomolecular solvation

Quantifying the influence of conformational uncertainty in biomolecular solvation Nathan
Baker1 Huan Lei2 Xiu Yang2 Bin Zheng2 Guang Lin3 1Computational and Statistical Analytics Division, Pacific Northwest National Laboratory 2Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory 3Department of Mathematics, Purdue University October 6, 2015 October 6, 2015

Outline Background on biomolecular solvation Relevance and objectives Continuum solvation
models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 2

Biomolecular solution environment matters Water and ions solvation charged objects
Solution conditions can aﬀect stability Biomolecular charge state can be highly variable with pH and other conditions October 6, 2015 | 3

Free energy and potential of mean force Free energy is
a state variable of the system; a measure of its ability to do work The configurational space of the system is defined by the configuration integral Z = V e−βH(x)dx; V ⊆ R3N and free energy is G = −β−1 log Z. Potential of mean force is a useful way to decompose free energy into degrees of freedom e−βW (y) = V e−βH(x;y)dx; V ⊆ R3N−M ; y ∈ U ⊆ RM W (y) = −β−1 log V e−βH(x;y)dx October 6, 2015 | 4

The role of solvent on the behavior of the solute
Integrate over solvent degrees of freedom; e.g., x ∈ V W (y) = −β−1 log V e−βH(x;y)dx System is described by remaining solute coordinate space; e.g., y ∈ U Free energy can be obtained by integration G = −β−1 log U e−βW (y)dy Continuum savings arise when |U| |V | October 6, 2015 | 5

Decomposition of the solvation thermodynamic cycle State function allows various
“routes” for calculation Arbitrarily decompose into Polar (charge-charge interactions) Non-polar (other interactions) October 6, 2015 | 6

Some models of solvation October 6, 2015 | 8

Continuum methods Use average quantities to describe Solvent density Solvent
polarization Can be derived from more complicated models Major assumptions Linear and local response Mean-ﬁeld Advantages Fast, deterministic convergence Reasonably accurate Disadvantages Lacks molecular detail Restrictive assumptions October 6, 2015 | 9

Solvation free energy cycle ∆Gsolv = ∆Gnp + ∆Gp ∆Gnp
= ∆Gcav + GvdW,s − GvdW,v ∆Gp = Gelec,s − Gelec,v October 6, 2015 | 10

Nonpolar solvation free energy Modeled by a perturbation from creation
of a hard-wall potential cavity in Lennard-Jones solvent ∆Gnp = ∆Gcav + GvdW,s − GvdW,v All energies deﬁned in terms of solvent density χ(x) Approximate cavity energy using scaled particle theory ∆Gcav ≈ G0 [χ] + pV [χ] + γA[χ] + . . . Approximate perturbation by convolution of solvent density with dispersive Lennard-Jones interaction with solute GvdW,s − GvdW,v ≈ ρ Na i χ(x)uvdW (x − xi )dx October 6, 2015 | 11

Polar solvation energy Modeled by Poisson-Boltzmann theory GPB,s = Gpol
+ Gq + Gion Continuum solvent assumptions Linear and local dielectric response: functional of solvent density χ and electrostatic potential φ Gpol + Gq ≈ Ω − 1 2 (∇φ)2 + φ dx Could subtract vacuum-state energy for “regularized” form Mean-ﬁeld ionic response: functional of electrostatic potential φ; dependent on ion-solute interaction potential V (χ) Gion ≈ −β−1 Ω ci e−βVi e−βqiφ − 1 dx October 6, 2015 | 12

Total solvation energy G = γA + pV + ρ
Γ UvdW dx − Ω 1 2 (∇φ)2 + φ −β−1 i ci e−βVi e−βqiφ − 1 dx Free energy is a functional G [χs , {ϑi } , φ] dependent on two other types of functions Position-dependent solvent and ion densities: χs : Ω → R and ϑi : Ω → R Electrostatic potential: φ : Ω → R Most current models assume ﬁxed densities: ϑi = ϑi,0 and χs = χs,0 October 6, 2015 | 13

Poisson-Boltzmann models Assuming G[φ] and densities are ﬁxed: δG[φ] δφ
= 0 0 = ∇ · ∇φ + + i ci qi e−βVi−βqiφ Poisson-Boltzmann equation strong form −∇ · ∇φ − i ci qi e−βVi−βqiφ = Often assume that Vi is “hard wall” potential such that e−βVi = H, a step function October 6, 2015 | 14

Poisson-Boltzmann equation October 6, 2015 | 15

Solvers Algebraic multigrid and adaptive finite elements Tetrahedral meshes A
priori refinement based on molecular geometries A posteriori refinement driven by residual-based error estimation Best for large systems with significant multiscale character Finite difference multigrid Locally Cartesian meshes Best for small- to medium-scale systems October 6, 2015 | 17

Software Holst group FEtk libraries (fetk.org): multigrid and FEM APBS
Freely available PB and nonpolar model solver Works with most popular visualization software (VMD, PMV, PyMOL) Links with CHARMM, AMBER, TINKER, NAMD 25,600+ registered users PDB2PQR Structure preparation service pKa calculations and titration state assignment Supported by NIH R01 GM069702 www.poissonboltzmann.org October 6, 2015 | 18

Applications October 6, 2015 | 19

Traditional approaches Assume a fixed solvent density profile χ0 Nonpolar
energy determined by χ0 ∆Gnp [χ0 ] = γA[χ0 ] + pV [χ0 ] + ρ Ω χ0 (x)UvdW dx Polar free energy determined by χ0 and φ G[ [χ0 , φ] = − Ω 1 2 (χ0 ) (∇φ)2 + φ −β−1 i ci e−βVi(χ0) e−βqiφ − 1 dx Mean-field descriptions of ion behavior October 6, 2015 | 21

Caveats Arbitrary surface definitions for , e−βVi , A, and
V Difficult to relate force field radii to coefficient definitions Different parameters often needed for polar and apolar solvation Inability to couple polar and nonpolar interactions Unable to describe strong density changes near polar/nonpolar interfaces Different models for polar and nonpolar interactions Mean-field point-like descriptions of ion behavior fail at high charge densities Difficult to incorporate conformational fluctuations and uncertainties October 6, 2015 | 22

Acknowledgments Team members Huan Lei Guang Lin Xiu Yang Bin
Zheng This work was supported by the DOE ASCR as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Collaborators: Paul Constantine George Karniadakis Greg Schenter Alex Tartakovsky Xiaoliang Wan Dongbin Xiu Zhongqiang Zhang Wen Zhou October 6, 2015 | 23

Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties
Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 24

Biomolecules are not rigid BPTI, of course! Biomolecular conformation ﬂuctuates.
The conformational space has very high dimensions. Many solvation calculations are performed based on a small number of conformations. How can we quantify the inﬂuence of conformational uncertainty on solvation properties? October 6, 2015 | 25

Overview of approach Our approach: construct a quasiharmonic surrogate model
for the full conformation space such that we can Approximate the target properties in terms of instantaneous conformational states Quantify the statistics of target properties with respect to conformational ﬂuctuations Numerical challenges: accurately constructing the surrogate model using limited sampling points in the presence of High-dimensionality of the random space Numerical errors in the sampled data October 6, 2015 | 26

Constructing the mesoscale model I Approximate the potential energy of
the molecule by an elastic network model: V (R, R) = γ 2 i<j (rij − rij )2h(rc − rij ) where R are the equilibrium positions; R are the instantaneous positions; γ is the elastic coefficient; h is a distance cutoff function; and rc is the cutoff distance. Other quasiharmonic approximations (e.g., PCA) could also be used. October 6, 2015 | 28

Constructing the mesoscale model II The quadratic approximation for the
potential function leads to a Gaussian distribution for conformational ﬂuctuations, with a correlation matrix Cij ≡ ∆Ri ∆RT j C = kB T γ WΛ−1W = UUT , where W and Λ are determined from an eigenvalue decomposition of the harmonic system and U are Cholesky factors. October 6, 2015 | 29

Constructing the mesoscale model III Conformational states can be generated
by the stochastic process R(ξ) = R + ∆R(ξ) ∆R(ξ) = Uξ where ξ is a (3N − 6)-dimensional standard normal random vector. The target property is denoted as X(ξ) := X(R(ξ)). A. R. Atilgan, I. Bahar, etc., Biophysical Journal 2001. October 6, 2015 | 30

Representing uncertainty with gPC I Represent target property uncertainty with
a generalized Polynomial Chaos (gPC) surrogate model: X(ξ) ≈ X(ξ) = P |α|=0 cα ψα (ξ) The gPC surrogate model allows us to Extract full approximate distribution function Evaluate properties at points which were not sampled October 6, 2015 | 31

Representing uncertainty with gPC II Construct the gPC expansion with
probabilistic collocation methods cα = X(ξ)ψα (ξ)dP(ξ) ≈ Q i=1 Xi ψα (ξi )wi , where Xi is the collocation point, α is a multi-index with dimension (P + d)!/(Pd)!, and wi is the weight. Major challenges for high-dimensional systems: large number of collocation points (e.g., d = 27, P = 2, Q = 327 = 7.6 × 1012 tensor product points) sensitive to numerical errors in the calculation of X October 6, 2015 | 32

Determining coefﬁcients for the gPC expansion I Consider the gPC
expansion X(ξ) = α cα ψα (ξ). By sampling ξ, we obtain: X(ξ1 ) ≈ N α=0 cα ψα (ξ1 ), X(ξ2 ) ≈ N α=0 cα ψα (ξ2 ), · · · which can be cast into the linear system:      ψ0 (ξ1 ) ψ1 (ξ1 ) · · · ψN (ξ1 ) ψ0 (ξ2 ) ψ1 (ξ2 ) · · · ψN (ξ2 ) . . . . . . ... . . . ψ0 (ξM ) ψ1 (ξM ) · · · ψN (ξM )           c0 c1 . . . cN      ≈      X(ξ1 ) X(ξ2 ) . . . X(ξM )      , October 6, 2015 | 33

Determining coefﬁcients for the gPC expansion II This system can
be rewritten as Ψc + e = X, where e is related to the truncation error. The dimension of c is very large (P + d)!/(Pd)!. However, we may be able to eﬃciently determine the values for c using compressive sensing – subject to certain restrictions. Consider a linear system October 6, 2015 | 34

Determining coefﬁcients for the gPC expansion III When M <
N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. The sparse vector c can be obtained by solving the optimization problem Ψc + e = u, where e 2 ≤ , e.g., minc c 1 subject to Ψc − u 2 ≤ . Critical problem: Rigorous proof of this approach requires restricted isometry condition for Ψ and sparsity of c – properties that are unknown a priori. E.J. Cand` es, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. October 6, 2015 | 35

Increasing the gPC expansion sparsity I For an i.i.d. Gaussian
random vector ξ, we can deﬁne the unitary transformation A χ = AT ξ, AAT = I where χ is also i.i.d. Gaussian and ψ(χ) corresponds to Hermite polynomial function. Sparsity can be increased with a renormalized active random space method; however, this requires knowledge of G = E ∇xi X(ξ)∇xi X(ξ)T October 6, 2015 | 36

Increasing the gPC expansion sparsity II However, this transformation can
be approximated by Gij ≈ E ∇ξ XgPC(ξ)∇ξ XgPC(ξ)T = E N n=1 cn ∂ψn (ξ) ∂ξi · N n =1 cn ∂ψn (ξ) ∂ξj = N n=1 N n =1 cn cn E ∂ψn (ξ) ∂ξi · ∂ψn (ξ) ∂ξj = cT Kij c, where Kij is a “stiﬀness” matrix describing the structure of the basis set. October 6, 2015 | 37

Increasing the gPC expansion sparsity III Given the input sample
ξ1, ξ2, · · · , ξM and the output samples X1, X2, · · · , XM , the approximate rotation matrix can be obtained iteratively to (possibly) improve its ability to increase the system sparseness: ξ → c(1) → G(1) → χ(1) = (G(1))T ξ χ1 → c(2) → G(2) → χ(2) = (G(2))T χ(1) · · · In other words, χ(L) = (G(L))T (G(L−1))T · · · (G(1))T ξ, where L is the number of iterations. P. Constantine, E. Dow, Q. Wang, arXiv:1304.2070, 2013. October 6, 2015 | 38

Summary of the numerical algorithm I 1. Construct the full
and reduced stochastic conformational spaces. Denote the stochastic space as a d-dimensional i.i.d. standard normal random vector ξ. 2. Generate M sampling points ξ1, ξ2, · · · , ξM based on the distribution of ξ. Numerically compute X on ξ1, ξ2, · · · , ξM to obtain M outputs X1, X2, · · · , XM . Set counter l = 0, χ(0) = ξ. 3. Construct the measurement matrix Ψ(l) ij = ψj((χ(l))i), and compute XgPC(χ(l)) by solving the weighted 1 minimization problem c = arg min c 1 subject to Ψc − x 2 ≤ δ. to obtain the gPC coeﬃcients c(l). 4. Evaluate the gradient matrix G(l+1) = E ∇XgPC(χ(l))∇XgPC(χ(l))T . 5. Set l = l + 1. If G(l) 1 − d < θ, where the threshold θ is a positive real number, then stop. Otherwise, go to Step 3 and utilize the M sampling data (X1, X2, · · · , XM ) that have been determined in Step 2. October 6, 2015 | 39

Stochastic PDE iterative rotation test I Application to a stochastic
elliptic PDE: − d dx a(x; ξ) du(x; ξ) dx = 1, x ∈ (0, 1) u(0) = u(1) = 0, a(x; ξ) = a0 (x) + exp σ 1 i=1 5c(x)i ξi ; e.g., Karhunen-Lo` eve (KL) expansion of an exponentially decaying covariance kernel, where {ξi } are i.i.d. standard Gaussian random variables. The quantity of interest is u(x; ξ)|x=0.35 . October 6, 2015 | 41

Stochastic PDE iterative rotation test II Stochastic PDE results shown
for d = 15, P = 3, and N = 816. Comparison of L2 error in iterative rotation approach with 1 methods. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.2 0.3 0.4 0.5 0.005 0.01 0.05 M/N Relative L 2 error October 6, 2015 | 42

KdV equation iterative rotation test I Application to the KdV
equation: ut (x, t; ξ) − 6u(x, t; ξ)ux (x, t; ξ) + uxxx (x, t; ξ) = f(t; ξ), x ∈ (−∞, ∞), u(x, 0; ξ) = −2 sech2(x). where f(t; ξ) = σ d i=1 λi φi (t)ξi , (2.1) σ is a constant, and {λi , φi (t)}d i=1 are eigenpairs of the exponential covariance kernel C(x, x ) = exp |x − x | lc . (2.2) We set lc = 0.25 and d = 10 ( d i=1 λi > 0.96 ∞ i=1 λi ). The quantity of interest is u(x, t; ξ)|x=6,t=1 . October 6, 2015 | 43

KdV equation iterative rotation test II Comparison of L2 error
between iterative reweighting and 1 methods for the KdV equation with d = 10, P = 4, and N = 1001. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.15 0.2 0.25 0.3 0.35 0.4 10−3 10−2 M/N Relative L 2 error October 6, 2015 | 44

gPC reproduces area distributions The new method faithfully reproduces SASA
distributions with respect to conformational ﬂuctuations. Distribution of solvent accessible surface area (˚ A2) for residue 14 of bovine pancreatic trypsin inhibitor (PDB ID 5PTI). Comparisons with sparse grid and Monte Carlo sampling methods are shown. SASA probability density distribution 60 70 80 90 100 110 120 0 0.025 0.05 0.075 0.1 MC (106 samples) SP - level 1 (55 samples) SP - level 2 (1513 samples) CS (300 samples) MC (300 samples) (a) number of sample K-L divergence 200 300 400 500 600 10-5 10-4 10-3 10-2 SP - level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) (b) October 6, 2015 | 45

Rotation improves surrogate models I Accuracy of the gPC representation.
gPC expansions X(χ) and X(ξ) by the compressive sensing method. Dashed lines: Sparse grid points on levels 1 and 2. number of sample Relative L 2 error 200 300 400 500 600 10-2 10-1 CS (ξ) p = 2 p = 3 CS (χ) p = 2 p = 3 Sp - level 1 (55 samples) Sp - level 2 (1513 samples) October 6, 2015 | 46

Rotation improves surrogate models II Rotation signiﬁcantly improves representation of
the response surface. (c) (d) Figure: Reduced response surface with respect to (χ1 , χ2 ) and (ξ21 , ξ22 ). October 6, 2015 | 47

Summary and remarks We developed a method to increase the
sparsity of the gPC expansion, yielding a more accurate surrogate model. Our method addresses a major problem (instability to numerical error) that the sparse grid method has for high-dimensional random spaces. Next steps include addressing Target properties: how do we handle discontinuities over the stochastic space? Target properties: how do we eﬃciently compute uncertainty in more complicated properties (e.g., electrostatic potential and potentials of mean force)? Visualization: how do we display this uncertainty to users? October 6, 2015 | 48

Quantifying the influence of conformational un...

Quantifying the influence of conformational uncertainty in biomolecular solvation

More Decks by Nathan Baker

Other Decks in Science

Featured

Transcript