Baker1 Huan Lei2 Xiu Yang2 Bin Zheng2 Guang Lin3 1Computational and Statistical Analytics Division, Pacific Northwest National Laboratory 2Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory 3Department of Mathematics, Purdue University October 6, 2015 October 6, 2015
a state variable of the system; a measure of its ability to do work The configurational space of the system is defined by the configuration integral Z = V e−βH(x)dx; V ⊆ R3N and free energy is G = −β−1 log Z. Potential of mean force is a useful way to decompose free energy into degrees of freedom e−βW (y) = V e−βH(x;y)dx; V ⊆ R3N−M ; y ∈ U ⊆ RM W (y) = −β−1 log V e−βH(x;y)dx October 6, 2015 | 4
Integrate over solvent degrees of freedom; e.g., x ∈ V W (y) = −β−1 log V e−βH(x;y)dx System is described by remaining solute coordinate space; e.g., y ∈ U Free energy can be obtained by integration G = −β−1 log U e−βW (y)dy Continuum savings arise when |U| |V | October 6, 2015 | 5
polarization Can be derived from more complicated models Major assumptions Linear and local response Mean-field Advantages Fast, deterministic convergence Reasonably accurate Disadvantages Lacks molecular detail Restrictive assumptions October 6, 2015 | 9
of a hard-wall potential cavity in Lennard-Jones solvent ∆Gnp = ∆Gcav + GvdW,s − GvdW,v All energies defined in terms of solvent density χ(x) Approximate cavity energy using scaled particle theory ∆Gcav ≈ G0 [χ] + pV [χ] + γA[χ] + . . . Approximate perturbation by convolution of solvent density with dispersive Lennard-Jones interaction with solute GvdW,s − GvdW,v ≈ ρ Na i χ(x)uvdW (x − xi )dx October 6, 2015 | 11
+ Gq + Gion Continuum solvent assumptions Linear and local dielectric response: functional of solvent density χ and electrostatic potential φ Gpol + Gq ≈ Ω − 1 2 (∇φ)2 + φ dx Could subtract vacuum-state energy for “regularized” form Mean-field ionic response: functional of electrostatic potential φ; dependent on ion-solute interaction potential V (χ) Gion ≈ −β−1 Ω ci e−βVi e−βqiφ − 1 dx October 6, 2015 | 12
Γ UvdW dx − Ω 1 2 (∇φ)2 + φ −β−1 i ci e−βVi e−βqiφ − 1 dx Free energy is a functional G [χs , {ϑi } , φ] dependent on two other types of functions Position-dependent solvent and ion densities: χs : Ω → R and ϑi : Ω → R Electrostatic potential: φ : Ω → R Most current models assume fixed densities: ϑi = ϑi,0 and χs = χs,0 October 6, 2015 | 13
= 0 0 = ∇ · ∇φ + + i ci qi e−βVi−βqiφ Poisson-Boltzmann equation strong form −∇ · ∇φ − i ci qi e−βVi−βqiφ = Often assume that Vi is “hard wall” potential such that e−βVi = H, a step function October 6, 2015 | 14
priori refinement based on molecular geometries A posteriori refinement driven by residual-based error estimation Best for large systems with significant multiscale character Finite difference multigrid Locally Cartesian meshes Best for small- to medium-scale systems October 6, 2015 | 17
Freely available PB and nonpolar model solver Works with most popular visualization software (VMD, PMV, PyMOL) Links with CHARMM, AMBER, TINKER, NAMD 25,600+ registered users PDB2PQR Structure preparation service pKa calculations and titration state assignment Supported by NIH R01 GM069702 www.poissonboltzmann.org October 6, 2015 | 18
V Difficult to relate force field radii to coefficient definitions Different parameters often needed for polar and apolar solvation Inability to couple polar and nonpolar interactions Unable to describe strong density changes near polar/nonpolar interfaces Different models for polar and nonpolar interactions Mean-field point-like descriptions of ion behavior fail at high charge densities Difficult to incorporate conformational fluctuations and uncertainties October 6, 2015 | 22
Zheng This work was supported by the DOE ASCR as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Collaborators: Paul Constantine George Karniadakis Greg Schenter Alex Tartakovsky Xiaoliang Wan Dongbin Xiu Zhongqiang Zhang Wen Zhou October 6, 2015 | 23
The conformational space has very high dimensions. Many solvation calculations are performed based on a small number of conformations. How can we quantify the influence of conformational uncertainty on solvation properties? October 6, 2015 | 25
for the full conformation space such that we can Approximate the target properties in terms of instantaneous conformational states Quantify the statistics of target properties with respect to conformational fluctuations Numerical challenges: accurately constructing the surrogate model using limited sampling points in the presence of High-dimensionality of the random space Numerical errors in the sampled data October 6, 2015 | 26
the molecule by an elastic network model: V (R, R) = γ 2 i<j (rij − rij )2h(rc − rij ) where R are the equilibrium positions; R are the instantaneous positions; γ is the elastic coefficient; h is a distance cutoff function; and rc is the cutoff distance. Other quasiharmonic approximations (e.g., PCA) could also be used. October 6, 2015 | 28
potential function leads to a Gaussian distribution for conformational fluctuations, with a correlation matrix Cij ≡ ∆Ri ∆RT j C = kB T γ WΛ−1W = UUT , where W and Λ are determined from an eigenvalue decomposition of the harmonic system and U are Cholesky factors. October 6, 2015 | 29
by the stochastic process R(ξ) = R + ∆R(ξ) ∆R(ξ) = Uξ where ξ is a (3N − 6)-dimensional standard normal random vector. The target property is denoted as X(ξ) := X(R(ξ)). A. R. Atilgan, I. Bahar, etc., Biophysical Journal 2001. October 6, 2015 | 30
a generalized Polynomial Chaos (gPC) surrogate model: X(ξ) ≈ X(ξ) = P |α|=0 cα ψα (ξ) The gPC surrogate model allows us to Extract full approximate distribution function Evaluate properties at points which were not sampled October 6, 2015 | 31
probabilistic collocation methods cα = X(ξ)ψα (ξ)dP(ξ) ≈ Q i=1 Xi ψα (ξi )wi , where Xi is the collocation point, α is a multi-index with dimension (P + d)!/(Pd)!, and wi is the weight. Major challenges for high-dimensional systems: large number of collocation points (e.g., d = 27, P = 2, Q = 327 = 7.6 × 1012 tensor product points) sensitive to numerical errors in the calculation of X October 6, 2015 | 32
be rewritten as Ψc + e = X, where e is related to the truncation error. The dimension of c is very large (P + d)!/(Pd)!. However, we may be able to efficiently determine the values for c using compressive sensing – subject to certain restrictions. Consider a linear system October 6, 2015 | 34
N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. The sparse vector c can be obtained by solving the optimization problem Ψc + e = u, where e 2 ≤ , e.g., minc c 1 subject to Ψc − u 2 ≤ . Critical problem: Rigorous proof of this approach requires restricted isometry condition for Ψ and sparsity of c – properties that are unknown a priori. E.J. Cand` es, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. October 6, 2015 | 35
random vector ξ, we can define the unitary transformation A χ = AT ξ, AAT = I where χ is also i.i.d. Gaussian and ψ(χ) corresponds to Hermite polynomial function. Sparsity can be increased with a renormalized active random space method; however, this requires knowledge of G = E ∇xi X(ξ)∇xi X(ξ)T October 6, 2015 | 36
be approximated by Gij ≈ E ∇ξ XgPC(ξ)∇ξ XgPC(ξ)T = E N n=1 cn ∂ψn (ξ) ∂ξi · N n =1 cn ∂ψn (ξ) ∂ξj = N n=1 N n =1 cn cn E ∂ψn (ξ) ∂ξi · ∂ψn (ξ) ∂ξj = cT Kij c, where Kij is a “stiffness” matrix describing the structure of the basis set. October 6, 2015 | 37
ξ1, ξ2, · · · , ξM and the output samples X1, X2, · · · , XM , the approximate rotation matrix can be obtained iteratively to (possibly) improve its ability to increase the system sparseness: ξ → c(1) → G(1) → χ(1) = (G(1))T ξ χ1 → c(2) → G(2) → χ(2) = (G(2))T χ(1) · · · In other words, χ(L) = (G(L))T (G(L−1))T · · · (G(1))T ξ, where L is the number of iterations. P. Constantine, E. Dow, Q. Wang, arXiv:1304.2070, 2013. October 6, 2015 | 38
and reduced stochastic conformational spaces. Denote the stochastic space as a d-dimensional i.i.d. standard normal random vector ξ. 2. Generate M sampling points ξ1, ξ2, · · · , ξM based on the distribution of ξ. Numerically compute X on ξ1, ξ2, · · · , ξM to obtain M outputs X1, X2, · · · , XM . Set counter l = 0, χ(0) = ξ. 3. Construct the measurement matrix Ψ(l) ij = ψj((χ(l))i), and compute XgPC(χ(l)) by solving the weighted 1 minimization problem c = arg min c 1 subject to Ψc − x 2 ≤ δ. to obtain the gPC coefficients c(l). 4. Evaluate the gradient matrix G(l+1) = E ∇XgPC(χ(l))∇XgPC(χ(l))T . 5. Set l = l + 1. If G(l) 1 − d < θ, where the threshold θ is a positive real number, then stop. Otherwise, go to Step 3 and utilize the M sampling data (X1, X2, · · · , XM ) that have been determined in Step 2. October 6, 2015 | 39
equation: ut (x, t; ξ) − 6u(x, t; ξ)ux (x, t; ξ) + uxxx (x, t; ξ) = f(t; ξ), x ∈ (−∞, ∞), u(x, 0; ξ) = −2 sech2(x). where f(t; ξ) = σ d i=1 λi φi (t)ξi , (2.1) σ is a constant, and {λi , φi (t)}d i=1 are eigenpairs of the exponential covariance kernel C(x, x ) = exp |x − x | lc . (2.2) We set lc = 0.25 and d = 10 ( d i=1 λi > 0.96 ∞ i=1 λi ). The quantity of interest is u(x, t; ξ)|x=6,t=1 . October 6, 2015 | 43
between iterative reweighting and 1 methods for the KdV equation with d = 10, P = 4, and N = 1001. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.15 0.2 0.25 0.3 0.35 0.4 10−3 10−2 M/N Relative L 2 error October 6, 2015 | 44
distributions with respect to conformational fluctuations. Distribution of solvent accessible surface area (˚ A2) for residue 14 of bovine pancreatic trypsin inhibitor (PDB ID 5PTI). Comparisons with sparse grid and Monte Carlo sampling methods are shown. SASA probability density distribution 60 70 80 90 100 110 120 0 0.025 0.05 0.075 0.1 MC (106 samples) SP - level 1 (55 samples) SP - level 2 (1513 samples) CS (300 samples) MC (300 samples) (a) number of sample K-L divergence 200 300 400 500 600 10-5 10-4 10-3 10-2 SP - level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) (b) October 6, 2015 | 45
sparsity of the gPC expansion, yielding a more accurate surrogate model. Our method addresses a major problem (instability to numerical error) that the sparse grid method has for high-dimensional random spaces. Next steps include addressing Target properties: how do we handle discontinuities over the stochastic space? Target properties: how do we efficiently compute uncertainty in more complicated properties (e.g., electrostatic potential and potentials of mean force)? Visualization: how do we display this uncertainty to users? October 6, 2015 | 48