Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quantifying the influence of conformational un...

Nathan Baker
October 12, 2015

Quantifying the influence of conformational uncertainty in biomolecular solvation

Talk given at MBI "Multiple Faces of Biomolecular Electrostatics" workshop.

Nathan Baker

October 12, 2015
Tweet

More Decks by Nathan Baker

Other Decks in Science

Transcript

  1. Quantifying the influence of conformational uncertainty in biomolecular solvation Nathan

    Baker1 Huan Lei2 Xiu Yang2 Bin Zheng2 Guang Lin3 1Computational and Statistical Analytics Division, Pacific Northwest National Laboratory 2Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory 3Department of Mathematics, Purdue University October 6, 2015 October 6, 2015
  2. Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

    models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 2
  3. Biomolecular solution environment matters Water and ions solvation charged objects

    Solution conditions can affect stability Biomolecular charge state can be highly variable with pH and other conditions October 6, 2015 | 3
  4. Free energy and potential of mean force Free energy is

    a state variable of the system; a measure of its ability to do work The configurational space of the system is defined by the configuration integral Z = V e−βH(x)dx; V ⊆ R3N and free energy is G = −β−1 log Z. Potential of mean force is a useful way to decompose free energy into degrees of freedom e−βW (y) = V e−βH(x;y)dx; V ⊆ R3N−M ; y ∈ U ⊆ RM W (y) = −β−1 log V e−βH(x;y)dx October 6, 2015 | 4
  5. The role of solvent on the behavior of the solute

    Integrate over solvent degrees of freedom; e.g., x ∈ V W (y) = −β−1 log V e−βH(x;y)dx System is described by remaining solute coordinate space; e.g., y ∈ U Free energy can be obtained by integration G = −β−1 log U e−βW (y)dy Continuum savings arise when |U| |V | October 6, 2015 | 5
  6. Decomposition of the solvation thermodynamic cycle State function allows various

    “routes” for calculation Arbitrarily decompose into Polar (charge-charge interactions) Non-polar (other interactions) October 6, 2015 | 6
  7. Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

    models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 7
  8. Continuum methods Use average quantities to describe Solvent density Solvent

    polarization Can be derived from more complicated models Major assumptions Linear and local response Mean-field Advantages Fast, deterministic convergence Reasonably accurate Disadvantages Lacks molecular detail Restrictive assumptions October 6, 2015 | 9
  9. Solvation free energy cycle ∆Gsolv = ∆Gnp + ∆Gp ∆Gnp

    = ∆Gcav + GvdW,s − GvdW,v ∆Gp = Gelec,s − Gelec,v October 6, 2015 | 10
  10. Nonpolar solvation free energy Modeled by a perturbation from creation

    of a hard-wall potential cavity in Lennard-Jones solvent ∆Gnp = ∆Gcav + GvdW,s − GvdW,v All energies defined in terms of solvent density χ(x) Approximate cavity energy using scaled particle theory ∆Gcav ≈ G0 [χ] + pV [χ] + γA[χ] + . . . Approximate perturbation by convolution of solvent density with dispersive Lennard-Jones interaction with solute GvdW,s − GvdW,v ≈ ρ Na i χ(x)uvdW (x − xi )dx October 6, 2015 | 11
  11. Polar solvation energy Modeled by Poisson-Boltzmann theory GPB,s = Gpol

    + Gq + Gion Continuum solvent assumptions Linear and local dielectric response: functional of solvent density χ and electrostatic potential φ Gpol + Gq ≈ Ω − 1 2 (∇φ)2 + φ dx Could subtract vacuum-state energy for “regularized” form Mean-field ionic response: functional of electrostatic potential φ; dependent on ion-solute interaction potential V (χ) Gion ≈ −β−1 Ω ci e−βVi e−βqiφ − 1 dx October 6, 2015 | 12
  12. Total solvation energy G = γA + pV + ρ

    Γ UvdW dx − Ω 1 2 (∇φ)2 + φ −β−1 i ci e−βVi e−βqiφ − 1 dx Free energy is a functional G [χs , {ϑi } , φ] dependent on two other types of functions Position-dependent solvent and ion densities: χs : Ω → R and ϑi : Ω → R Electrostatic potential: φ : Ω → R Most current models assume fixed densities: ϑi = ϑi,0 and χs = χs,0 October 6, 2015 | 13
  13. Poisson-Boltzmann models Assuming G[φ] and densities are fixed: δG[φ] δφ

    = 0 0 = ∇ · ∇φ + + i ci qi e−βVi−βqiφ Poisson-Boltzmann equation strong form −∇ · ∇φ − i ci qi e−βVi−βqiφ = Often assume that Vi is “hard wall” potential such that e−βVi = H, a step function October 6, 2015 | 14
  14. Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

    models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 16
  15. Solvers Algebraic multigrid and adaptive finite elements Tetrahedral meshes A

    priori refinement based on molecular geometries A posteriori refinement driven by residual-based error estimation Best for large systems with significant multiscale character Finite difference multigrid Locally Cartesian meshes Best for small- to medium-scale systems October 6, 2015 | 17
  16. Software Holst group FEtk libraries (fetk.org): multigrid and FEM APBS

    Freely available PB and nonpolar model solver Works with most popular visualization software (VMD, PMV, PyMOL) Links with CHARMM, AMBER, TINKER, NAMD 25,600+ registered users PDB2PQR Structure preparation service pKa calculations and titration state assignment Supported by NIH R01 GM069702 www.poissonboltzmann.org October 6, 2015 | 18
  17. Outline Background on biomolecular solvation Relevance and objectives Continuum solvation

    models Scalable numerical methods Caveats in current approaches Quantifying uncertainty in solvation properties October 6, 2015 | 20
  18. Traditional approaches Assume a fixed solvent density profile χ0 Nonpolar

    energy determined by χ0 ∆Gnp [χ0 ] = γA[χ0 ] + pV [χ0 ] + ρ Ω χ0 (x)UvdW dx Polar free energy determined by χ0 and φ G[ [χ0 , φ] = − Ω 1 2 (χ0 ) (∇φ)2 + φ −β−1 i ci e−βVi(χ0) e−βqiφ − 1 dx Mean-field descriptions of ion behavior October 6, 2015 | 21
  19. Caveats Arbitrary surface definitions for , e−βVi , A, and

    V Difficult to relate force field radii to coefficient definitions Different parameters often needed for polar and apolar solvation Inability to couple polar and nonpolar interactions Unable to describe strong density changes near polar/nonpolar interfaces Different models for polar and nonpolar interactions Mean-field point-like descriptions of ion behavior fail at high charge densities Difficult to incorporate conformational fluctuations and uncertainties October 6, 2015 | 22
  20. Acknowledgments Team members Huan Lei Guang Lin Xiu Yang Bin

    Zheng This work was supported by the DOE ASCR as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Collaborators: Paul Constantine George Karniadakis Greg Schenter Alex Tartakovsky Xiaoliang Wan Dongbin Xiu Zhongqiang Zhang Wen Zhou October 6, 2015 | 23
  21. Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

    Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 24
  22. Biomolecules are not rigid BPTI, of course! Biomolecular conformation fluctuates.

    The conformational space has very high dimensions. Many solvation calculations are performed based on a small number of conformations. How can we quantify the influence of conformational uncertainty on solvation properties? October 6, 2015 | 25
  23. Overview of approach Our approach: construct a quasiharmonic surrogate model

    for the full conformation space such that we can Approximate the target properties in terms of instantaneous conformational states Quantify the statistics of target properties with respect to conformational fluctuations Numerical challenges: accurately constructing the surrogate model using limited sampling points in the presence of High-dimensionality of the random space Numerical errors in the sampled data October 6, 2015 | 26
  24. Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

    Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 27
  25. Constructing the mesoscale model I Approximate the potential energy of

    the molecule by an elastic network model: V (R, R) = γ 2 i<j (rij − rij )2h(rc − rij ) where R are the equilibrium positions; R are the instantaneous positions; γ is the elastic coefficient; h is a distance cutoff function; and rc is the cutoff distance. Other quasiharmonic approximations (e.g., PCA) could also be used. October 6, 2015 | 28
  26. Constructing the mesoscale model II The quadratic approximation for the

    potential function leads to a Gaussian distribution for conformational fluctuations, with a correlation matrix Cij ≡ ∆Ri ∆RT j C = kB T γ WΛ−1W = UUT , where W and Λ are determined from an eigenvalue decomposition of the harmonic system and U are Cholesky factors. October 6, 2015 | 29
  27. Constructing the mesoscale model III Conformational states can be generated

    by the stochastic process R(ξ) = R + ∆R(ξ) ∆R(ξ) = Uξ where ξ is a (3N − 6)-dimensional standard normal random vector. The target property is denoted as X(ξ) := X(R(ξ)). A. R. Atilgan, I. Bahar, etc., Biophysical Journal 2001. October 6, 2015 | 30
  28. Representing uncertainty with gPC I Represent target property uncertainty with

    a generalized Polynomial Chaos (gPC) surrogate model: X(ξ) ≈ X(ξ) = P |α|=0 cα ψα (ξ) The gPC surrogate model allows us to Extract full approximate distribution function Evaluate properties at points which were not sampled October 6, 2015 | 31
  29. Representing uncertainty with gPC II Construct the gPC expansion with

    probabilistic collocation methods cα = X(ξ)ψα (ξ)dP(ξ) ≈ Q i=1 Xi ψα (ξi )wi , where Xi is the collocation point, α is a multi-index with dimension (P + d)!/(Pd)!, and wi is the weight. Major challenges for high-dimensional systems: large number of collocation points (e.g., d = 27, P = 2, Q = 327 = 7.6 × 1012 tensor product points) sensitive to numerical errors in the calculation of X October 6, 2015 | 32
  30. Determining coefficients for the gPC expansion I Consider the gPC

    expansion X(ξ) = α cα ψα (ξ). By sampling ξ, we obtain: X(ξ1 ) ≈ N α=0 cα ψα (ξ1 ), X(ξ2 ) ≈ N α=0 cα ψα (ξ2 ), · · · which can be cast into the linear system:      ψ0 (ξ1 ) ψ1 (ξ1 ) · · · ψN (ξ1 ) ψ0 (ξ2 ) ψ1 (ξ2 ) · · · ψN (ξ2 ) . . . . . . ... . . . ψ0 (ξM ) ψ1 (ξM ) · · · ψN (ξM )           c0 c1 . . . cN      ≈      X(ξ1 ) X(ξ2 ) . . . X(ξM )      , October 6, 2015 | 33
  31. Determining coefficients for the gPC expansion II This system can

    be rewritten as Ψc + e = X, where e is related to the truncation error. The dimension of c is very large (P + d)!/(Pd)!. However, we may be able to efficiently determine the values for c using compressive sensing – subject to certain restrictions. Consider a linear system October 6, 2015 | 34
  32. Determining coefficients for the gPC expansion III When M <

    N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. The sparse vector c can be obtained by solving the optimization problem Ψc + e = u, where e 2 ≤ , e.g., minc c 1 subject to Ψc − u 2 ≤ . Critical problem: Rigorous proof of this approach requires restricted isometry condition for Ψ and sparsity of c – properties that are unknown a priori. E.J. Cand` es, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. October 6, 2015 | 35
  33. Increasing the gPC expansion sparsity I For an i.i.d. Gaussian

    random vector ξ, we can define the unitary transformation A χ = AT ξ, AAT = I where χ is also i.i.d. Gaussian and ψ(χ) corresponds to Hermite polynomial function. Sparsity can be increased with a renormalized active random space method; however, this requires knowledge of G = E ∇xi X(ξ)∇xi X(ξ)T October 6, 2015 | 36
  34. Increasing the gPC expansion sparsity II However, this transformation can

    be approximated by Gij ≈ E ∇ξ XgPC(ξ)∇ξ XgPC(ξ)T = E N n=1 cn ∂ψn (ξ) ∂ξi · N n =1 cn ∂ψn (ξ) ∂ξj = N n=1 N n =1 cn cn E ∂ψn (ξ) ∂ξi · ∂ψn (ξ) ∂ξj = cT Kij c, where Kij is a “stiffness” matrix describing the structure of the basis set. October 6, 2015 | 37
  35. Increasing the gPC expansion sparsity III Given the input sample

    ξ1, ξ2, · · · , ξM and the output samples X1, X2, · · · , XM , the approximate rotation matrix can be obtained iteratively to (possibly) improve its ability to increase the system sparseness: ξ → c(1) → G(1) → χ(1) = (G(1))T ξ χ1 → c(2) → G(2) → χ(2) = (G(2))T χ(1) · · · In other words, χ(L) = (G(L))T (G(L−1))T · · · (G(1))T ξ, where L is the number of iterations. P. Constantine, E. Dow, Q. Wang, arXiv:1304.2070, 2013. October 6, 2015 | 38
  36. Summary of the numerical algorithm I 1. Construct the full

    and reduced stochastic conformational spaces. Denote the stochastic space as a d-dimensional i.i.d. standard normal random vector ξ. 2. Generate M sampling points ξ1, ξ2, · · · , ξM based on the distribution of ξ. Numerically compute X on ξ1, ξ2, · · · , ξM to obtain M outputs X1, X2, · · · , XM . Set counter l = 0, χ(0) = ξ. 3. Construct the measurement matrix Ψ(l) ij = ψj((χ(l))i), and compute XgPC(χ(l)) by solving the weighted 1 minimization problem c = arg min c 1 subject to Ψc − x 2 ≤ δ. to obtain the gPC coefficients c(l). 4. Evaluate the gradient matrix G(l+1) = E ∇XgPC(χ(l))∇XgPC(χ(l))T . 5. Set l = l + 1. If G(l) 1 − d < θ, where the threshold θ is a positive real number, then stop. Otherwise, go to Step 3 and utilize the M sampling data (X1, X2, · · · , XM ) that have been determined in Step 2. October 6, 2015 | 39
  37. Outline Background on biomolecular solvation Quantifying uncertainty in solvation properties

    Background and motivation Surrogate models for conformational uncertainty Numerical results and the impact of error October 6, 2015 | 40
  38. Stochastic PDE iterative rotation test I Application to a stochastic

    elliptic PDE: − d dx a(x; ξ) du(x; ξ) dx = 1, x ∈ (0, 1) u(0) = u(1) = 0, a(x; ξ) = a0 (x) + exp σ 1 i=1 5c(x)i ξi ; e.g., Karhunen-Lo` eve (KL) expansion of an exponentially decaying covariance kernel, where {ξi } are i.i.d. standard Gaussian random variables. The quantity of interest is u(x; ξ)|x=0.35 . October 6, 2015 | 41
  39. Stochastic PDE iterative rotation test II Stochastic PDE results shown

    for d = 15, P = 3, and N = 816. Comparison of L2 error in iterative rotation approach with 1 methods. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.2 0.3 0.4 0.5 0.005 0.01 0.05 M/N Relative L 2 error October 6, 2015 | 42
  40. KdV equation iterative rotation test I Application to the KdV

    equation: ut (x, t; ξ) − 6u(x, t; ξ)ux (x, t; ξ) + uxxx (x, t; ξ) = f(t; ξ), x ∈ (−∞, ∞), u(x, 0; ξ) = −2 sech2(x). where f(t; ξ) = σ d i=1 λi φi (t)ξi , (2.1) σ is a constant, and {λi , φi (t)}d i=1 are eigenpairs of the exponential covariance kernel C(x, x ) = exp |x − x | lc . (2.2) We set lc = 0.25 and d = 10 ( d i=1 λi > 0.96 ∞ i=1 λi ). The quantity of interest is u(x, t; ξ)|x=6,t=1 . October 6, 2015 | 43
  41. KdV equation iterative rotation test II Comparison of L2 error

    between iterative reweighting and 1 methods for the KdV equation with d = 10, P = 4, and N = 1001. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.15 0.2 0.25 0.3 0.35 0.4 10−3 10−2 M/N Relative L 2 error October 6, 2015 | 44
  42. gPC reproduces area distributions The new method faithfully reproduces SASA

    distributions with respect to conformational fluctuations. Distribution of solvent accessible surface area (˚ A2) for residue 14 of bovine pancreatic trypsin inhibitor (PDB ID 5PTI). Comparisons with sparse grid and Monte Carlo sampling methods are shown. SASA probability density distribution 60 70 80 90 100 110 120 0 0.025 0.05 0.075 0.1 MC (106 samples) SP - level 1 (55 samples) SP - level 2 (1513 samples) CS (300 samples) MC (300 samples) (a) number of sample K-L divergence 200 300 400 500 600 10-5 10-4 10-3 10-2 SP - level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) (b) October 6, 2015 | 45
  43. Rotation improves surrogate models I Accuracy of the gPC representation.

    gPC expansions X(χ) and X(ξ) by the compressive sensing method. Dashed lines: Sparse grid points on levels 1 and 2. number of sample Relative L 2 error 200 300 400 500 600 10-2 10-1 CS (ξ) p = 2 p = 3 CS (χ) p = 2 p = 3 Sp - level 1 (55 samples) Sp - level 2 (1513 samples) October 6, 2015 | 46
  44. Rotation improves surrogate models II Rotation significantly improves representation of

    the response surface. (c) (d) Figure: Reduced response surface with respect to (χ1 , χ2 ) and (ξ21 , ξ22 ). October 6, 2015 | 47
  45. Summary and remarks We developed a method to increase the

    sparsity of the gPC expansion, yielding a more accurate surrogate model. Our method addresses a major problem (instability to numerical error) that the sparse grid method has for high-dimensional random spaces. Next steps include addressing Target properties: how do we handle discontinuities over the stochastic space? Target properties: how do we efficiently compute uncertainty in more complicated properties (e.g., electrostatic potential and potentials of mean force)? Visualization: how do we display this uncertainty to users? October 6, 2015 | 48