Clément Walter

Rare event simulation a Point Process interpretation with application in
probability and quantile estimation and metamodel based algorithms Séminaire S3 | Clément WALTER March 13th 2015

Introduction Problem setting: X random vector with know distribution µX
g a "black-box" function representing a computer code: g : Rd → R Y = g(X) the real-valued random variable which describes the state of the system; its distribution µY is unknown Uncertainty Quantication: F = {x ∈ Rd | g(x) > q} nd p = P [X ∈ F] = µX(F) for a given q nd q for a given p Issues p = µX(F) = µY ([q; +∞[) 1 needs to use g to get F or µY which is time costly Monte Carlo estimator has a CV δ2 ≈ 1/Np ⇒ N 1/p Séminaire S3 | March 13th 2015 | PAGE 1/26

Introduction Two main directions to overcome this issue: learn a
metamodel on g use variance-reduction techniques to estimate p Séminaire S3 | March 13th 2015 | PAGE 2/26

metamodel on g use variance-reduction techniques to estimate p Importance sampling Séminaire S3 | March 13th 2015 | PAGE 2/26

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Séminaire S3 | March 13th 2015 | PAGE 2/26

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Multilevel Splitting (Subset Simulations) Write the sought probability p as a product of less small probabilities and estimate them with MCMC: let (qi)i=0..m be an increasing sequence with q0 = −∞ and qm = q: p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1] = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1] Séminaire S3 | March 13th 2015 | PAGE 2/26

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Multilevel Splitting (Subset Simulations) Write the sought probability p as a product of less small probabilities and estimate them with MCMC: let (qi)i=0..m be an increasing sequence with q0 = −∞ and qm = q: p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1] = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1] ⇒ how to choose (qi)i : beforehand or on-the-go? Optimality? Bias? Séminaire S3 | March 13th 2015 | PAGE 2/26

Introduction A typical MS algorithm works as follows: 1 Sample
a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m Séminaire S3 | March 13th 2015 | PAGE 3/26

a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m ⇒ Parallel computation at each iteration in the resampling step Séminaire S3 | March 13th 2015 | PAGE 3/26

a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m ⇒ Parallel computation at each iteration in the resampling step Minimal variance when all conditional probabilities are equal [4] Adaptive Multilevel Splitting: qj+1 ← y(p0N) or qj+1 ← y(k) empirical quantiles of order p0 ∈ (0, 1) ⇒ bias [4, 1]; the number of subsets converges toward a constant log p/ log p0 empirical quantiles of order k/N ⇒ no bias and CLT [3, 2] minimal variance with k = 1 (Last Particle Algorithm [5, 6]); the number of subsets follows a Poisson law with parameter −N log p ⇒ disables parallel computation Séminaire S3 | March 13th 2015 | PAGE 3/26

Outline 1 Increasing random walk 2 Probability estimation 3 Quantile
estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 4/26

Outline 1 Increasing random walk 2 Probability estimation 3 Quantile

Increasing random walk Denition Denition Let Y be a real-valued
random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) Séminaire S3 | March 13th 2015 | PAGE 6/26

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Séminaire S3 | March 13th 2015 | PAGE 6/26

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Time in the Poisson process is linked with rarity in probability Séminaire S3 | March 13th 2015 | PAGE 6/26

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Time in the Poisson process is linked with rarity in probability The number of events My before y ∈ R is related to P [Y > y] = py My = Mt=− log py L ∼ P(− log py) Séminaire S3 | March 13th 2015 | PAGE 6/26

Increasing random walk Denition First consequence: number of simulations to
get the realisation of a random variable above a given threshold (event with probability p) follows a Poisson law P(log 1/p) instead of a Geometric law G(p). 0 20 40 60 80 100 0.00 0.05 0.10 0.15 0.20 N Density Figure: Comparison of Poisson and Geometric densities with p = 0.0228 Séminaire S3 | March 13th 2015 | PAGE 7/26

Increasing random walk Example Y ∼ N(0, 1) ; p
= P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 8/26 -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 x Density

Plan 1 Increasing random walk 2 Probability estimation 3 Quantile

Probability estimation Denition of the estimator Concept Number of events
before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Séminaire S3 | March 13th 2015 | PAGE 10/26

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) Séminaire S3 | March 13th 2015 | PAGE 10/26

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq Séminaire S3 | March 13th 2015 | PAGE 10/26

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq ⇒ Last Particle Estimator, but with parallel implementation Séminaire S3 | March 13th 2015 | PAGE 10/26

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq ⇒ Last Particle Estimator, but with parallel implementation ⇒ One has indeed an estimator of P [Y > q0] , ∀q0 ≤ q: FN (y) = 1 − 1 − 1 N My a.s. − − − − → N→∞ F(y) Séminaire S3 | March 13th 2015 | PAGE 10/26

Probability estimation Example Y ∼ N(0, 1) ; p =
P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 11/26 -4 -2 0 2 4 0.0 0.4 0.8 y Empirical cdf N = 2 ; p = 0.0078

Probability estimation Practical implementation Ideally, one knows how to generate
the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points Séminaire S3 | March 13th 2015 | PAGE 12/26

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Séminaire S3 | March 13th 2015 | PAGE 12/26

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Generating k random walks Require: k, q Generate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0) while min Y < q do 3: ind ← which Y < q for i in ind do Mi = Mi + 1 6: Generate X∗ ∼ µX (· | X > g(Xi)) Xi ← X∗; Yi = g(X∗) end for 9: end while Return M, (Xi)i=1..N , (Yi)i=1..N Séminaire S3 | March 13th 2015 | PAGE 12/26

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Generating k random walks Require: k, q Generate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0) while min Y < q do 3: ind ← which Y < q for i in ind do Mi = Mi + 1 6: Generate X∗ ∼ µX (· | X > g(Xi)) Xi ← X∗; Yi = g(X∗) end for 9: end while Return M, (Xi)i=1..N , (Yi)i=1..N ⇒ each sample is resampled according to its own level Séminaire S3 | March 13th 2015 | PAGE 12/26

Probability estimation Practical implementation Convergence of the Markov chain (for
conditional sampling) is increased when the starting point already follows the targeted distribution; 2 possibilities: store each state (Xi)i and its corresponding level re-draw only the smallest Xi (⇒ Last Particle Algorithm) ⇒ LPA is only one possible implementation of this estimator Séminaire S3 | March 13th 2015 | PAGE 13/26

Probability estimation Practical implementation Convergence of the Markov chain (for
conditional sampling) is increased when the starting point already follows the targeted distribution; 2 possibilities: store each state (Xi)i and its corresponding level re-draw only the smallest Xi (⇒ Last Particle Algorithm) ⇒ LPA is only one possible implementation of this estimator Computing time with LPA implementation Let tpar be the random time of generating N random walks by batches of size k = N/nc (nc standing for a number of cores) with burn-in T tpar = max of nc RV ∼ P(−k log p) E [tpar] = T(log p)2 ncδ2  1 + ncδ2 (log p)2 2 log nc + 1 T log 1/p   Séminaire S3 | March 13th 2015 | PAGE 13/26

Probability estimation Comparison Mean computer time against coecient of variation:
the cost of an algorithm is the number of generated samples in a row by a core. We assume nc ≥ 1 cores and burn-in = T for Metropolis-Hastings Algorithm Time Coef. of var. δ2 Times VS δ Monte Carlo N/nc 1/Np 1/pδ2 AMS T log p log p0 N(1−p0) nc log p log p0 1−p0 Np0 (1−p0)2 p0(log p0)2 T(log p)2 ncδ2 LPA −TN log p −log p N T(log p)2 δ2 Random walk −T N nc log p −log p N T(log p)2 ncδ2 best AMS when p0 → 1 LPA brings the theoretically best AMS but is not parallel Random walk allows for taking p0 → 1 while keeping the parallel implementation Séminaire S3 | March 13th 2015 | PAGE 14/26

Quantile estimation Denition Concept Approximate a time t = −
log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ Séminaire S3 | March 13th 2015 | PAGE 16/26

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf Séminaire S3 | March 13th 2015 | PAGE 16/26

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf q = 1 2 (Ym + Ym+1) with m = E[Mq] = −N log p Séminaire S3 | March 13th 2015 | PAGE 16/26

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf q = 1 2 (Ym + Ym+1) with m = E[Mq] = −N log p CLT: √ N (q − q) L −→ m→∞ N 0, −p2 log p f(q)2 Bounds on bias on O(1/N) Séminaire S3 | March 13th 2015 | PAGE 16/26

Quantile estimation Example Y ∼ N(0, 1) ; p =
P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 17/26 -4 -2 0 2 4 -0.10 0.00 0.10 y N = 2 ; q = 1.348

Design points Algorithm No need for exact sampling if the
goal is only to get failing samples ⇒ use of a metamodel for conditional sampling Getting Nfail failing samples Sample a minimal-sized DoE Learn a rst metamodel with trend = failure 3: for Nfail times do Simulate the random walks one after the other Sample X1 ∼ µX ; y1 = g(X1); m = 1; train the metamodel while ym < q do 6: Xm+1 = Xm ; ym+1 = ym for T times do Pseudo burn-in X∗ ∼ K(Xm+1, ·); g(X∗) = y∗ K is a kernel for Markov chain sampling 9: If y∗ > ym+1 , ym+1 = y∗ and Xm+1 = X∗ end for ym+1 = g(Xm+1); train the metamodel 12: If ym+1 < ym , Xm+1 = Xm ; ym+1 = ym ; m = m + 1 end while end for Séminaire S3 | March 13th 2015 | PAGE 19/26

Design points Example Parabolic limit-state function: g : x ∈
R2 −→ 5 − x2 − 0.5(x1 − 0.1)2 Séminaire S3 | March 13th 2015 | PAGE 20/26 −5 0 5 −5 0 5 x y LSF LSF

Design points Example A two-dimensional four branches serial system: g
: x ∈ R2 −→ min 3 + (x1 − x2)2 10 − | x1 + x2 | √ 2 , 7 √ 2 − | x1 − x2 | Séminaire S3 | March 13th 2015 | PAGE 21/26 −5 0 5 −5 0 5 x y LSF

Conclusion Conclusion One considers Markov chains instead of samples ⇒
N is a number of processes Lets dene parallel estimators for probabilities and quantiles (and moments [8]) Twins of Monte Carlo estimators with a "log attribute": similar statistical properties but adding a log to the 1/p factor: var [pMC] ≈ p2 Np → var [p] ≈ p2 log 1/p N var [qMC] ≈ p2 Nf(q)2p → var [q] ≈ p2 log 1/p Nf(q)2 Séminaire S3 | March 13th 2015 | PAGE 23/26

Conclusion Perspectives Adaptation of quantile estimator for optimisation problem (min
or max) Problem of conditional simulations (Metropolis-Hastings) Best use of a metamodel Adaptation for discontinuous RV Séminaire S3 | March 13th 2015 | PAGE 24/26

Bibliography I S-K Au and J L Beck. Estimation of
small failure probabilities in high dimensions by subset simulation. Probabilistic Engineering Mechanics, 16(4):263277, 2001. Charles-Edouard Bréhier, Ludovic Goudenege, and Loic Tudela. Central limit theorem for adaptative multilevel splitting estimators in an idealized setting. arXiv preprint arXiv:1501.01399, 2015. Charles-Edouard Bréhier, Tony Lelievre, and Mathias Rousset. Analysis of adaptive multilevel splitting algorithms in an idealized case. arXiv preprint arXiv:1405.1352, 2014. F Cérou, P Del Moral, T Furon, and A Guyader. Sequential Monte Carlo for rare event estimation. Statistics and Computing, 22(3):795808, 2012. Séminaire S3 | March 13th 2015 | PAGE 25/26

Bibliography II A Guyader, N Hengartner, and E Matzner-Løber. Simulation
and estimation of extreme quantiles and extreme probabilities. Applied Mathematics & Optimization, 64(2):171196, 2011. Eric Simonnet. Combinatorial analysis of the adaptive last particle method. Statistics and Computing, pages 120, 2014. Clement Walter. Moving Particles: a parallel optimal Multilevel Splitting method with application in quantiles estimation and meta-model based algorithms. To appear in Structural Safety, 2014. Clement Walter. Point process-based estimation of kth-order moment. arXiv preprint arXiv:1412.6368, 2014. Séminaire S3 | March 13th 2015 | PAGE 26/26

Merci ! Commissariat à l'énergie atomique et aux énergies alternatives
CEA, DAM, DIF, F-91297 Arpajon, France Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019 CEA DAM DIF

Clément Walter

Clément Walter

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript