Nathan Baker
October 12, 2015
100

# Quantifying the influence of conformational uncertainty in biomolecular solvation

Talk given at MBI "Multiple Faces of Biomolecular Electrostatics" workshop.

October 12, 2015

## Transcript

1. ### Quantifying the inﬂuence of conformational uncertainty in biomolecular solvation Nathan

Baker1 Huan Lei2 Xiu Yang2 Bin Zheng2 Guang Lin3 1Computational and Statistical Analytics Division, Paciﬁc Northwest National Laboratory 2Fundamental and Computational Sciences Directorate, Paciﬁc Northwest National Laboratory 3Department of Mathematics, Purdue University October 6, 2015 October 6, 2015
2. ### Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 2
3. ### Biomolecular solution environment matters Water and ions solvation charged objects

Solution conditions can aﬀect stability Biomolecular charge state can be highly variable with pH and other conditions October 6, 2015 | 3
4. ### Free energy and potential of mean force Free energy is

a state variable of the system; a measure of its ability to do work The conﬁgurational space of the system is deﬁned by the conﬁguration integral Z = V e−βH(x)dx; V ⊆ R3N and free energy is G = −β−1 log Z. Potential of mean force is a useful way to decompose free energy into degrees of freedom e−βW (y) = V e−βH(x;y)dx; V ⊆ R3N−M ; y ∈ U ⊆ RM W (y) = −β−1 log V e−βH(x;y)dx October 6, 2015 | 4
5. ### The role of solvent on the behavior of the solute

Integrate over solvent degrees of freedom; e.g., x ∈ V W (y) = −β−1 log V e−βH(x;y)dx System is described by remaining solute coordinate space; e.g., y ∈ U Free energy can be obtained by integration G = −β−1 log U e−βW (y)dy Continuum savings arise when |U| |V | October 6, 2015 | 5
6. ### Decomposition of the solvation thermodynamic cycle State function allows various

“routes” for calculation Arbitrarily decompose into Polar (charge-charge interactions) Non-polar (other interactions) October 6, 2015 | 6
7. ### Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 7

9. ### Continuum methods Use average quantities to describe Solvent density Solvent

polarization Can be derived from more complicated models Major assumptions Linear and local response Mean-ﬁeld Advantages Fast, deterministic convergence Reasonably accurate Disadvantages Lacks molecular detail Restrictive assumptions October 6, 2015 | 9
10. ### Solvation free energy cycle ∆Gsolv = ∆Gnp + ∆Gp ∆Gnp

= ∆Gcav + GvdW,s − GvdW,v ∆Gp = Gelec,s − Gelec,v October 6, 2015 | 10
11. ### Nonpolar solvation free energy Modeled by a perturbation from creation

of a hard-wall potential cavity in Lennard-Jones solvent ∆Gnp = ∆Gcav + GvdW,s − GvdW,v All energies deﬁned in terms of solvent density χ(x) Approximate cavity energy using scaled particle theory ∆Gcav ≈ G0 [χ] + pV [χ] + γA[χ] + . . . Approximate perturbation by convolution of solvent density with dispersive Lennard-Jones interaction with solute GvdW,s − GvdW,v ≈ ρ Na i χ(x)uvdW (x − xi )dx October 6, 2015 | 11
12. ### Polar solvation energy Modeled by Poisson-Boltzmann theory GPB,s = Gpol

+ Gq + Gion Continuum solvent assumptions Linear and local dielectric response: functional of solvent density χ and electrostatic potential φ Gpol + Gq ≈ Ω − 1 2 (∇φ)2 + φ dx Could subtract vacuum-state energy for “regularized” form Mean-ﬁeld ionic response: functional of electrostatic potential φ; dependent on ion-solute interaction potential V (χ) Gion ≈ −β−1 Ω ci e−βVi e−βqiφ − 1 dx October 6, 2015 | 12
13. ### Total solvation energy G = γA + pV + ρ

Γ UvdW dx − Ω 1 2 (∇φ)2 + φ −β−1 i ci e−βVi e−βqiφ − 1 dx Free energy is a functional G [χs , {ϑi } , φ] dependent on two other types of functions Position-dependent solvent and ion densities: χs : Ω → R and ϑi : Ω → R Electrostatic potential: φ : Ω → R Most current models assume ﬁxed densities: ϑi = ϑi,0 and χs = χs,0 October 6, 2015 | 13
14. ### Poisson-Boltzmann models Assuming G[φ] and densities are ﬁxed: δG[φ] δφ

= 0 0 = ∇ · ∇φ + + i ci qi e−βVi−βqiφ Poisson-Boltzmann equation strong form −∇ · ∇φ − i ci qi e−βVi−βqiφ = Often assume that Vi is “hard wall” potential such that e−βVi = H, a step function October 6, 2015 | 14

16. ### Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 16
17. ### Solvers Algebraic multigrid and adaptive ﬁnite elements Tetrahedral meshes A

priori reﬁnement based on molecular geometries A posteriori reﬁnement driven by residual-based error estimation Best for large systems with signiﬁcant multiscale character Finite diﬀerence multigrid Locally Cartesian meshes Best for small- to medium-scale systems October 6, 2015 | 17
18. ### Software Holst group FEtk libraries (fetk.org): multigrid and FEM APBS

Freely available PB and nonpolar model solver Works with most popular visualization software (VMD, PMV, PyMOL) Links with CHARMM, AMBER, TINKER, NAMD 25,600+ registered users PDB2PQR Structure preparation service pKa calculations and titration state assignment Supported by NIH R01 GM069702 www.poissonboltzmann.org October 6, 2015 | 18

20. ### Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 20
21. ### Traditional approaches Assume a ﬁxed solvent density proﬁle χ0 Nonpolar

energy determined by χ0 ∆Gnp [χ0 ] = γA[χ0 ] + pV [χ0 ] + ρ Ω χ0 (x)UvdW dx Polar free energy determined by χ0 and φ G[ [χ0 , φ] = − Ω 1 2 (χ0 ) (∇φ)2 + φ −β−1 i ci e−βVi(χ0) e−βqiφ − 1 dx Mean-ﬁeld descriptions of ion behavior October 6, 2015 | 21
22. ### Caveats Arbitrary surface deﬁnitions for , e−βVi , A, and

V Diﬃcult to relate force ﬁeld radii to coeﬃcient deﬁnitions Diﬀerent parameters often needed for polar and apolar solvation Inability to couple polar and nonpolar interactions Unable to describe strong density changes near polar/nonpolar interfaces Diﬀerent models for polar and nonpolar interactions Mean-ﬁeld point-like descriptions of ion behavior fail at high charge densities Diﬃcult to incorporate conformational ﬂuctuations and uncertainties October 6, 2015 | 22
23. ### Acknowledgments Team members Huan Lei Guang Lin Xiu Yang Bin

Zheng This work was supported by the DOE ASCR as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Collaborators: Paul Constantine George Karniadakis Greg Schenter Alex Tartakovsky Xiaoliang Wan Dongbin Xiu Zhongqiang Zhang Wen Zhou October 6, 2015 | 23
24. ### Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 24
25. ### Biomolecules are not rigid BPTI, of course! Biomolecular conformation ﬂuctuates.

The conformational space has very high dimensions. Many solvation calculations are performed based on a small number of conformations. How can we quantify the inﬂuence of conformational uncertainty on solvation properties? October 6, 2015 | 25
26. ### Overview of approach Our approach: construct a quasiharmonic surrogate model

for the full conformation space such that we can Approximate the target properties in terms of instantaneous conformational states Quantify the statistics of target properties with respect to conformational ﬂuctuations Numerical challenges: accurately constructing the surrogate model using limited sampling points in the presence of High-dimensionality of the random space Numerical errors in the sampled data October 6, 2015 | 26
27. ### Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 27
28. ### Constructing the mesoscale model I Approximate the potential energy of

the molecule by an elastic network model: V (R, R) = γ 2 i<j (rij − rij )2h(rc − rij ) where R are the equilibrium positions; R are the instantaneous positions; γ is the elastic coeﬃcient; h is a distance cutoﬀ function; and rc is the cutoﬀ distance. Other quasiharmonic approximations (e.g., PCA) could also be used. October 6, 2015 | 28
29. ### Constructing the mesoscale model II The quadratic approximation for the

potential function leads to a Gaussian distribution for conformational ﬂuctuations, with a correlation matrix Cij ≡ ∆Ri ∆RT j C = kB T γ WΛ−1W = UUT , where W and Λ are determined from an eigenvalue decomposition of the harmonic system and U are Cholesky factors. October 6, 2015 | 29
30. ### Constructing the mesoscale model III Conformational states can be generated

by the stochastic process R(ξ) = R + ∆R(ξ) ∆R(ξ) = Uξ where ξ is a (3N − 6)-dimensional standard normal random vector. The target property is denoted as X(ξ) := X(R(ξ)). A. R. Atilgan, I. Bahar, etc., Biophysical Journal 2001. October 6, 2015 | 30
31. ### Representing uncertainty with gPC I Represent target property uncertainty with

a generalized Polynomial Chaos (gPC) surrogate model: X(ξ) ≈ X(ξ) = P |α|=0 cα ψα (ξ) The gPC surrogate model allows us to Extract full approximate distribution function Evaluate properties at points which were not sampled October 6, 2015 | 31
32. ### Representing uncertainty with gPC II Construct the gPC expansion with

probabilistic collocation methods cα = X(ξ)ψα (ξ)dP(ξ) ≈ Q i=1 Xi ψα (ξi )wi , where Xi is the collocation point, α is a multi-index with dimension (P + d)!/(Pd)!, and wi is the weight. Major challenges for high-dimensional systems: large number of collocation points (e.g., d = 27, P = 2, Q = 327 = 7.6 × 1012 tensor product points) sensitive to numerical errors in the calculation of X October 6, 2015 | 32
33. ### Determining coefﬁcients for the gPC expansion I Consider the gPC

expansion X(ξ) = α cα ψα (ξ). By sampling ξ, we obtain: X(ξ1 ) ≈ N α=0 cα ψα (ξ1 ), X(ξ2 ) ≈ N α=0 cα ψα (ξ2 ), · · · which can be cast into the linear system:      ψ0 (ξ1 ) ψ1 (ξ1 ) · · · ψN (ξ1 ) ψ0 (ξ2 ) ψ1 (ξ2 ) · · · ψN (ξ2 ) . . . . . . ... . . . ψ0 (ξM ) ψ1 (ξM ) · · · ψN (ξM )           c0 c1 . . . cN      ≈      X(ξ1 ) X(ξ2 ) . . . X(ξM )      , October 6, 2015 | 33
34. ### Determining coefﬁcients for the gPC expansion II This system can

be rewritten as Ψc + e = X, where e is related to the truncation error. The dimension of c is very large (P + d)!/(Pd)!. However, we may be able to eﬃciently determine the values for c using compressive sensing – subject to certain restrictions. Consider a linear system October 6, 2015 | 34
35. ### Determining coefﬁcients for the gPC expansion III When M <

N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. The sparse vector c can be obtained by solving the optimization problem Ψc + e = u, where e 2 ≤ , e.g., minc c 1 subject to Ψc − u 2 ≤ . Critical problem: Rigorous proof of this approach requires restricted isometry condition for Ψ and sparsity of c – properties that are unknown a priori. E.J. Cand` es, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. October 6, 2015 | 35
36. ### Increasing the gPC expansion sparsity I For an i.i.d. Gaussian

random vector ξ, we can deﬁne the unitary transformation A χ = AT ξ, AAT = I where χ is also i.i.d. Gaussian and ψ(χ) corresponds to Hermite polynomial function. Sparsity can be increased with a renormalized active random space method; however, this requires knowledge of G = E ∇xi X(ξ)∇xi X(ξ)T October 6, 2015 | 36
37. ### Increasing the gPC expansion sparsity II However, this transformation can

be approximated by Gij ≈ E ∇ξ XgPC(ξ)∇ξ XgPC(ξ)T = E N n=1 cn ∂ψn (ξ) ∂ξi · N n =1 cn ∂ψn (ξ) ∂ξj = N n=1 N n =1 cn cn E ∂ψn (ξ) ∂ξi · ∂ψn (ξ) ∂ξj = cT Kij c, where Kij is a “stiﬀness” matrix describing the structure of the basis set. October 6, 2015 | 37
38. ### Increasing the gPC expansion sparsity III Given the input sample

ξ1, ξ2, · · · , ξM and the output samples X1, X2, · · · , XM , the approximate rotation matrix can be obtained iteratively to (possibly) improve its ability to increase the system sparseness: ξ → c(1) → G(1) → χ(1) = (G(1))T ξ χ1 → c(2) → G(2) → χ(2) = (G(2))T χ(1) · · · In other words, χ(L) = (G(L))T (G(L−1))T · · · (G(1))T ξ, where L is the number of iterations. P. Constantine, E. Dow, Q. Wang, arXiv:1304.2070, 2013. October 6, 2015 | 38
39. ### Summary of the numerical algorithm I 1. Construct the full

and reduced stochastic conformational spaces. Denote the stochastic space as a d-dimensional i.i.d. standard normal random vector ξ. 2. Generate M sampling points ξ1, ξ2, · · · , ξM based on the distribution of ξ. Numerically compute X on ξ1, ξ2, · · · , ξM to obtain M outputs X1, X2, · · · , XM . Set counter l = 0, χ(0) = ξ. 3. Construct the measurement matrix Ψ(l) ij = ψj((χ(l))i), and compute XgPC(χ(l)) by solving the weighted 1 minimization problem c = arg min c 1 subject to Ψc − x 2 ≤ δ. to obtain the gPC coeﬃcients c(l). 4. Evaluate the gradient matrix G(l+1) = E ∇XgPC(χ(l))∇XgPC(χ(l))T . 5. Set l = l + 1. If G(l) 1 − d < θ, where the threshold θ is a positive real number, then stop. Otherwise, go to Step 3 and utilize the M sampling data (X1, X2, · · · , XM ) that have been determined in Step 2. October 6, 2015 | 39
40. ### Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 40
41. ### Stochastic PDE iterative rotation test I Application to a stochastic

elliptic PDE: − d dx a(x; ξ) du(x; ξ) dx = 1, x ∈ (0, 1) u(0) = u(1) = 0, a(x; ξ) = a0 (x) + exp σ 1 i=1 5c(x)i ξi ; e.g., Karhunen-Lo` eve (KL) expansion of an exponentially decaying covariance kernel, where {ξi } are i.i.d. standard Gaussian random variables. The quantity of interest is u(x; ξ)|x=0.35 . October 6, 2015 | 41
42. ### Stochastic PDE iterative rotation test II Stochastic PDE results shown

for d = 15, P = 3, and N = 816. Comparison of L2 error in iterative rotation approach with 1 methods. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.2 0.3 0.4 0.5 0.005 0.01 0.05 M/N Relative L 2 error October 6, 2015 | 42
43. ### KdV equation iterative rotation test I Application to the KdV

equation: ut (x, t; ξ) − 6u(x, t; ξ)ux (x, t; ξ) + uxxx (x, t; ξ) = f(t; ξ), x ∈ (−∞, ∞), u(x, 0; ξ) = −2 sech2(x). where f(t; ξ) = σ d i=1 λi φi (t)ξi , (2.1) σ is a constant, and {λi , φi (t)}d i=1 are eigenpairs of the exponential covariance kernel C(x, x ) = exp |x − x | lc . (2.2) We set lc = 0.25 and d = 10 ( d i=1 λi > 0.96 ∞ i=1 λi ). The quantity of interest is u(x, t; ξ)|x=6,t=1 . October 6, 2015 | 43
44. ### KdV equation iterative rotation test II Comparison of L2 error

between iterative reweighting and 1 methods for the KdV equation with d = 10, P = 4, and N = 1001. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.15 0.2 0.25 0.3 0.35 0.4 10−3 10−2 M/N Relative L 2 error October 6, 2015 | 44
45. ### gPC reproduces area distributions The new method faithfully reproduces SASA

distributions with respect to conformational ﬂuctuations. Distribution of solvent accessible surface area (˚ A2) for residue 14 of bovine pancreatic trypsin inhibitor (PDB ID 5PTI). Comparisons with sparse grid and Monte Carlo sampling methods are shown. SASA probability density distribution 60 70 80 90 100 110 120 0 0.025 0.05 0.075 0.1 MC (106 samples) SP - level 1 (55 samples) SP - level 2 (1513 samples) CS (300 samples) MC (300 samples) (a) number of sample K-L divergence 200 300 400 500 600 10-5 10-4 10-3 10-2 SP - level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) (b) October 6, 2015 | 45
46. ### Rotation improves surrogate models I Accuracy of the gPC representation.

gPC expansions X(χ) and X(ξ) by the compressive sensing method. Dashed lines: Sparse grid points on levels 1 and 2. number of sample Relative L 2 error 200 300 400 500 600 10-2 10-1 CS (ξ) p = 2 p = 3 CS (χ) p = 2 p = 3 Sp - level 1 (55 samples) Sp - level 2 (1513 samples) October 6, 2015 | 46
47. ### Rotation improves surrogate models II Rotation signiﬁcantly improves representation of

the response surface. (c) (d) Figure: Reduced response surface with respect to (χ1 , χ2 ) and (ξ21 , ξ22 ). October 6, 2015 | 47
48. ### Summary and remarks We developed a method to increase the

sparsity of the gPC expansion, yielding a more accurate surrogate model. Our method addresses a major problem (instability to numerical error) that the sparse grid method has for high-dimensional random spaces. Next steps include addressing Target properties: how do we handle discontinuities over the stochastic space? Target properties: how do we eﬃciently compute uncertainty in more complicated properties (e.g., electrostatic potential and potentials of mean force)? Visualization: how do we display this uncertainty to users? October 6, 2015 | 48