Baker1 Huan Lei2 Xiu Yang2 Bin Zheng2 Guang Lin3 1Computational and Statistical Analytics Division, Paciﬁc Northwest National Laboratory 2Fundamental and Computational Sciences Directorate, Paciﬁc Northwest National Laboratory 3Department of Mathematics, Purdue University October 6, 2015 October 6, 2015

a state variable of the system; a measure of its ability to do work The conﬁgurational space of the system is deﬁned by the conﬁguration integral Z = V e−βH(x)dx; V ⊆ R3N and free energy is G = −β−1 log Z. Potential of mean force is a useful way to decompose free energy into degrees of freedom e−βW (y) = V e−βH(x;y)dx; V ⊆ R3N−M ; y ∈ U ⊆ RM W (y) = −β−1 log V e−βH(x;y)dx October 6, 2015 | 4

Integrate over solvent degrees of freedom; e.g., x ∈ V W (y) = −β−1 log V e−βH(x;y)dx System is described by remaining solute coordinate space; e.g., y ∈ U Free energy can be obtained by integration G = −β−1 log U e−βW (y)dy Continuum savings arise when |U| |V | October 6, 2015 | 5

polarization Can be derived from more complicated models Major assumptions Linear and local response Mean-ﬁeld Advantages Fast, deterministic convergence Reasonably accurate Disadvantages Lacks molecular detail Restrictive assumptions October 6, 2015 | 9

of a hard-wall potential cavity in Lennard-Jones solvent ∆Gnp = ∆Gcav + GvdW,s − GvdW,v All energies deﬁned in terms of solvent density χ(x) Approximate cavity energy using scaled particle theory ∆Gcav ≈ G0 [χ] + pV [χ] + γA[χ] + . . . Approximate perturbation by convolution of solvent density with dispersive Lennard-Jones interaction with solute GvdW,s − GvdW,v ≈ ρ Na i χ(x)uvdW (x − xi )dx October 6, 2015 | 11

+ Gq + Gion Continuum solvent assumptions Linear and local dielectric response: functional of solvent density χ and electrostatic potential φ Gpol + Gq ≈ Ω − 1 2 (∇φ)2 + φ dx Could subtract vacuum-state energy for “regularized” form Mean-ﬁeld ionic response: functional of electrostatic potential φ; dependent on ion-solute interaction potential V (χ) Gion ≈ −β−1 Ω ci e−βVi e−βqiφ − 1 dx October 6, 2015 | 12

Γ UvdW dx − Ω 1 2 (∇φ)2 + φ −β−1 i ci e−βVi e−βqiφ − 1 dx Free energy is a functional G [χs , {ϑi } , φ] dependent on two other types of functions Position-dependent solvent and ion densities: χs : Ω → R and ϑi : Ω → R Electrostatic potential: φ : Ω → R Most current models assume ﬁxed densities: ϑi = ϑi,0 and χs = χs,0 October 6, 2015 | 13

= 0 0 = ∇ · ∇φ + + i ci qi e−βVi−βqiφ Poisson-Boltzmann equation strong form −∇ · ∇φ − i ci qi e−βVi−βqiφ = Often assume that Vi is “hard wall” potential such that e−βVi = H, a step function October 6, 2015 | 14

priori reﬁnement based on molecular geometries A posteriori reﬁnement driven by residual-based error estimation Best for large systems with signiﬁcant multiscale character Finite diﬀerence multigrid Locally Cartesian meshes Best for small- to medium-scale systems October 6, 2015 | 17

Freely available PB and nonpolar model solver Works with most popular visualization software (VMD, PMV, PyMOL) Links with CHARMM, AMBER, TINKER, NAMD 25,600+ registered users PDB2PQR Structure preparation service pKa calculations and titration state assignment Supported by NIH R01 GM069702 www.poissonboltzmann.org October 6, 2015 | 18

V Diﬃcult to relate force ﬁeld radii to coeﬃcient deﬁnitions Diﬀerent parameters often needed for polar and apolar solvation Inability to couple polar and nonpolar interactions Unable to describe strong density changes near polar/nonpolar interfaces Diﬀerent models for polar and nonpolar interactions Mean-ﬁeld point-like descriptions of ion behavior fail at high charge densities Diﬃcult to incorporate conformational ﬂuctuations and uncertainties October 6, 2015 | 22

Zheng This work was supported by the DOE ASCR as part of the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Collaborators: Paul Constantine George Karniadakis Greg Schenter Alex Tartakovsky Xiaoliang Wan Dongbin Xiu Zhongqiang Zhang Wen Zhou October 6, 2015 | 23

The conformational space has very high dimensions. Many solvation calculations are performed based on a small number of conformations. How can we quantify the inﬂuence of conformational uncertainty on solvation properties? October 6, 2015 | 25

for the full conformation space such that we can Approximate the target properties in terms of instantaneous conformational states Quantify the statistics of target properties with respect to conformational ﬂuctuations Numerical challenges: accurately constructing the surrogate model using limited sampling points in the presence of High-dimensionality of the random space Numerical errors in the sampled data October 6, 2015 | 26

the molecule by an elastic network model: V (R, R) = γ 2 i<j (rij − rij )2h(rc − rij ) where R are the equilibrium positions; R are the instantaneous positions; γ is the elastic coeﬃcient; h is a distance cutoﬀ function; and rc is the cutoﬀ distance. Other quasiharmonic approximations (e.g., PCA) could also be used. October 6, 2015 | 28

potential function leads to a Gaussian distribution for conformational ﬂuctuations, with a correlation matrix Cij ≡ ∆Ri ∆RT j C = kB T γ WΛ−1W = UUT , where W and Λ are determined from an eigenvalue decomposition of the harmonic system and U are Cholesky factors. October 6, 2015 | 29

by the stochastic process R(ξ) = R + ∆R(ξ) ∆R(ξ) = Uξ where ξ is a (3N − 6)-dimensional standard normal random vector. The target property is denoted as X(ξ) := X(R(ξ)). A. R. Atilgan, I. Bahar, etc., Biophysical Journal 2001. October 6, 2015 | 30

a generalized Polynomial Chaos (gPC) surrogate model: X(ξ) ≈ X(ξ) = P |α|=0 cα ψα (ξ) The gPC surrogate model allows us to Extract full approximate distribution function Evaluate properties at points which were not sampled October 6, 2015 | 31

probabilistic collocation methods cα = X(ξ)ψα (ξ)dP(ξ) ≈ Q i=1 Xi ψα (ξi )wi , where Xi is the collocation point, α is a multi-index with dimension (P + d)!/(Pd)!, and wi is the weight. Major challenges for high-dimensional systems: large number of collocation points (e.g., d = 27, P = 2, Q = 327 = 7.6 × 1012 tensor product points) sensitive to numerical errors in the calculation of X October 6, 2015 | 32

be rewritten as Ψc + e = X, where e is related to the truncation error. The dimension of c is very large (P + d)!/(Pd)!. However, we may be able to eﬃciently determine the values for c using compressive sensing – subject to certain restrictions. Consider a linear system October 6, 2015 | 34

N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. The sparse vector c can be obtained by solving the optimization problem Ψc + e = u, where e 2 ≤ , e.g., minc c 1 subject to Ψc − u 2 ≤ . Critical problem: Rigorous proof of this approach requires restricted isometry condition for Ψ and sparsity of c – properties that are unknown a priori. E.J. Cand` es, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. October 6, 2015 | 35

random vector ξ, we can deﬁne the unitary transformation A χ = AT ξ, AAT = I where χ is also i.i.d. Gaussian and ψ(χ) corresponds to Hermite polynomial function. Sparsity can be increased with a renormalized active random space method; however, this requires knowledge of G = E ∇xi X(ξ)∇xi X(ξ)T October 6, 2015 | 36

be approximated by Gij ≈ E ∇ξ XgPC(ξ)∇ξ XgPC(ξ)T = E N n=1 cn ∂ψn (ξ) ∂ξi · N n =1 cn ∂ψn (ξ) ∂ξj = N n=1 N n =1 cn cn E ∂ψn (ξ) ∂ξi · ∂ψn (ξ) ∂ξj = cT Kij c, where Kij is a “stiﬀness” matrix describing the structure of the basis set. October 6, 2015 | 37

ξ1, ξ2, · · · , ξM and the output samples X1, X2, · · · , XM , the approximate rotation matrix can be obtained iteratively to (possibly) improve its ability to increase the system sparseness: ξ → c(1) → G(1) → χ(1) = (G(1))T ξ χ1 → c(2) → G(2) → χ(2) = (G(2))T χ(1) · · · In other words, χ(L) = (G(L))T (G(L−1))T · · · (G(1))T ξ, where L is the number of iterations. P. Constantine, E. Dow, Q. Wang, arXiv:1304.2070, 2013. October 6, 2015 | 38

and reduced stochastic conformational spaces. Denote the stochastic space as a d-dimensional i.i.d. standard normal random vector ξ. 2. Generate M sampling points ξ1, ξ2, · · · , ξM based on the distribution of ξ. Numerically compute X on ξ1, ξ2, · · · , ξM to obtain M outputs X1, X2, · · · , XM . Set counter l = 0, χ(0) = ξ. 3. Construct the measurement matrix Ψ(l) ij = ψj((χ(l))i), and compute XgPC(χ(l)) by solving the weighted 1 minimization problem c = arg min c 1 subject to Ψc − x 2 ≤ δ. to obtain the gPC coeﬃcients c(l). 4. Evaluate the gradient matrix G(l+1) = E ∇XgPC(χ(l))∇XgPC(χ(l))T . 5. Set l = l + 1. If G(l) 1 − d < θ, where the threshold θ is a positive real number, then stop. Otherwise, go to Step 3 and utilize the M sampling data (X1, X2, · · · , XM ) that have been determined in Step 2. October 6, 2015 | 39

equation: ut (x, t; ξ) − 6u(x, t; ξ)ux (x, t; ξ) + uxxx (x, t; ξ) = f(t; ξ), x ∈ (−∞, ∞), u(x, 0; ξ) = −2 sech2(x). where f(t; ξ) = σ d i=1 λi φi (t)ξi , (2.1) σ is a constant, and {λi , φi (t)}d i=1 are eigenpairs of the exponential covariance kernel C(x, x ) = exp |x − x | lc . (2.2) We set lc = 0.25 and d = 10 ( d i=1 λi > 0.96 ∞ i=1 λi ). The quantity of interest is u(x, t; ξ)|x=6,t=1 . October 6, 2015 | 43

between iterative reweighting and 1 methods for the KdV equation with d = 10, P = 4, and N = 1001. “◦”: standard 1 , “ ”: re-weighted 1 , “ ”: rotated 1 , “ ”: re-weighted and iteratively rotated 1 . 0.1 0.15 0.2 0.25 0.3 0.35 0.4 10−3 10−2 M/N Relative L 2 error October 6, 2015 | 44

distributions with respect to conformational ﬂuctuations. Distribution of solvent accessible surface area (˚ A2) for residue 14 of bovine pancreatic trypsin inhibitor (PDB ID 5PTI). Comparisons with sparse grid and Monte Carlo sampling methods are shown. SASA probability density distribution 60 70 80 90 100 110 120 0 0.025 0.05 0.075 0.1 MC (106 samples) SP - level 1 (55 samples) SP - level 2 (1513 samples) CS (300 samples) MC (300 samples) (a) number of sample K-L divergence 200 300 400 500 600 10-5 10-4 10-3 10-2 SP - level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) (b) October 6, 2015 | 45

sparsity of the gPC expansion, yielding a more accurate surrogate model. Our method addresses a major problem (instability to numerical error) that the sparse grid method has for high-dimensional random spaces. Next steps include addressing Target properties: how do we handle discontinuities over the stochastic space? Target properties: how do we eﬃciently compute uncertainty in more complicated properties (e.g., electrostatic potential and potentials of mean force)? Visualization: how do we display this uncertainty to users? October 6, 2015 | 48