# Clément Walter

(CEA, DAM, DIF and Université Paris Diderot, Paris, FR)

https://s3-seminar.github.io/seminars/clement-walter

Title — Rare event simulation: a Point Process interpretation with application in probability and quantile estimation

Abstract — Clément Walter, 25, graduated from Mines ParisTech in 2013. Beforehand he attended preparatory classes in Lycée Sainte-Geneviève (branch Maths and Physics). For his master degree he specialised in Geostatistics and started working with CEA as an intern on emulation of complex computer codes (especially kriging) for rare event simulation and estimation. He has then pursued his work in a PhD under the direction of Pr. Josselin Garnier, focusing on multilevel splitting methods.

March 13, 2015

## Transcript

1. ### Rare event simulation a Point Process interpretation with application in

probability and quantile estimation and metamodel based algorithms Séminaire S3 | Clément WALTER March 13th 2015
2. ### Introduction Problem setting: X random vector with know distribution µX

g a "black-box" function representing a computer code: g : Rd → R Y = g(X) the real-valued random variable which describes the state of the system; its distribution µY is unknown Uncertainty Quantication: F = {x ∈ Rd | g(x) > q} nd p = P [X ∈ F] = µX(F) for a given q nd q for a given p Issues p = µX(F) = µY ([q; +∞[) 1 needs to use g to get F or µY which is time costly Monte Carlo estimator has a CV δ2 ≈ 1/Np ⇒ N 1/p Séminaire S3 | March 13th 2015 | PAGE 1/26
3. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Séminaire S3 | March 13th 2015 | PAGE 2/26
4. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Séminaire S3 | March 13th 2015 | PAGE 2/26
5. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Importance sampling Séminaire S3 | March 13th 2015 | PAGE 2/26
6. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Séminaire S3 | March 13th 2015 | PAGE 2/26
7. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Multilevel Splitting (Subset Simulations) Write the sought probability p as a product of less small probabilities and estimate them with MCMC: let (qi)i=0..m be an increasing sequence with q0 = −∞ and qm = q: p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1] = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1] Séminaire S3 | March 13th 2015 | PAGE 2/26
8. ### Introduction Two main directions to overcome this issue: learn a

metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Multilevel Splitting (Subset Simulations) Write the sought probability p as a product of less small probabilities and estimate them with MCMC: let (qi)i=0..m be an increasing sequence with q0 = −∞ and qm = q: p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1] = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1] ⇒ how to choose (qi)i : beforehand or on-the-go? Optimality? Bias? Séminaire S3 | March 13th 2015 | PAGE 2/26
9. ### Introduction A typical MS algorithm works as follows: 1 Sample

a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m Séminaire S3 | March 13th 2015 | PAGE 3/26
10. ### Introduction A typical MS algorithm works as follows: 1 Sample

a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m ⇒ Parallel computation at each iteration in the resampling step Séminaire S3 | March 13th 2015 | PAGE 3/26
11. ### Introduction A typical MS algorithm works as follows: 1 Sample

a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m ⇒ Parallel computation at each iteration in the resampling step Minimal variance when all conditional probabilities are equal  Adaptive Multilevel Splitting: qj+1 ← y(p0N) or qj+1 ← y(k) empirical quantiles of order p0 ∈ (0, 1) ⇒ bias [4, 1]; the number of subsets converges toward a constant log p/ log p0 empirical quantiles of order k/N ⇒ no bias and CLT [3, 2] minimal variance with k = 1 (Last Particle Algorithm [5, 6]); the number of subsets follows a Poisson law with parameter −N log p ⇒ disables parallel computation Séminaire S3 | March 13th 2015 | PAGE 3/26
12. ### Outline 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 4/26
13. ### Outline 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 5/26
14. ### Increasing random walk Denition Denition Let Y be a real-valued

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) Séminaire S3 | March 13th 2015 | PAGE 6/26
15. ### Increasing random walk Denition Denition Let Y be a real-valued

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Séminaire S3 | March 13th 2015 | PAGE 6/26
16. ### Increasing random walk Denition Denition Let Y be a real-valued

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Time in the Poisson process is linked with rarity in probability Séminaire S3 | March 13th 2015 | PAGE 6/26
17. ### Increasing random walk Denition Denition Let Y be a real-valued

random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Time in the Poisson process is linked with rarity in probability The number of events My before y ∈ R is related to P [Y > y] = py My = Mt=− log py L ∼ P(− log py) Séminaire S3 | March 13th 2015 | PAGE 6/26
18. ### Increasing random walk Denition First consequence: number of simulations to

get the realisation of a random variable above a given threshold (event with probability p) follows a Poisson law P(log 1/p) instead of a Geometric law G(p). 0 20 40 60 80 100 0.00 0.05 0.10 0.15 0.20 N Density Figure: Comparison of Poisson and Geometric densities with p = 0.0228 Séminaire S3 | March 13th 2015 | PAGE 7/26
19. ### Increasing random walk Example Y ∼ N(0, 1) ; p

= P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 8/26 -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 x Density
20. ### Plan 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 9/26
21. ### Probability estimation Denition of the estimator Concept Number of events

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Séminaire S3 | March 13th 2015 | PAGE 10/26
22. ### Probability estimation Denition of the estimator Concept Number of events

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) Séminaire S3 | March 13th 2015 | PAGE 10/26
23. ### Probability estimation Denition of the estimator Concept Number of events

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq Séminaire S3 | March 13th 2015 | PAGE 10/26
24. ### Probability estimation Denition of the estimator Concept Number of events

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq ⇒ Last Particle Estimator, but with parallel implementation Séminaire S3 | March 13th 2015 | PAGE 10/26
25. ### Probability estimation Denition of the estimator Concept Number of events

before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq ⇒ Last Particle Estimator, but with parallel implementation ⇒ One has indeed an estimator of P [Y > q0] , ∀q0 ≤ q: FN (y) = 1 − 1 − 1 N My a.s. − − − − → N→∞ F(y) Séminaire S3 | March 13th 2015 | PAGE 10/26
26. ### Probability estimation Example Y ∼ N(0, 1) ; p =

P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 11/26 -4 -2 0 2 4 0.0 0.4 0.8 y Empirical cdf N = 2 ; p = 0.0078
27. ### Probability estimation Practical implementation Ideally, one knows how to generate

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points Séminaire S3 | March 13th 2015 | PAGE 12/26
28. ### Probability estimation Practical implementation Ideally, one knows how to generate

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Séminaire S3 | March 13th 2015 | PAGE 12/26
29. ### Probability estimation Practical implementation Ideally, one knows how to generate

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Generating k random walks Require: k, q Generate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0) while min Y < q do 3: ind ← which Y < q for i in ind do Mi = Mi + 1 6: Generate X∗ ∼ µX (· | X > g(Xi)) Xi ← X∗; Yi = g(X∗) end for 9: end while Return M, (Xi)i=1..N , (Yi)i=1..N Séminaire S3 | March 13th 2015 | PAGE 12/26
30. ### Probability estimation Practical implementation Ideally, one knows how to generate

the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Generating k random walks Require: k, q Generate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0) while min Y < q do 3: ind ← which Y < q for i in ind do Mi = Mi + 1 6: Generate X∗ ∼ µX (· | X > g(Xi)) Xi ← X∗; Yi = g(X∗) end for 9: end while Return M, (Xi)i=1..N , (Yi)i=1..N ⇒ each sample is resampled according to its own level Séminaire S3 | March 13th 2015 | PAGE 12/26
31. ### Probability estimation Practical implementation Convergence of the Markov chain (for

conditional sampling) is increased when the starting point already follows the targeted distribution; 2 possibilities: store each state (Xi)i and its corresponding level re-draw only the smallest Xi (⇒ Last Particle Algorithm) ⇒ LPA is only one possible implementation of this estimator Séminaire S3 | March 13th 2015 | PAGE 13/26
32. ### Probability estimation Practical implementation Convergence of the Markov chain (for

conditional sampling) is increased when the starting point already follows the targeted distribution; 2 possibilities: store each state (Xi)i and its corresponding level re-draw only the smallest Xi (⇒ Last Particle Algorithm) ⇒ LPA is only one possible implementation of this estimator Computing time with LPA implementation Let tpar be the random time of generating N random walks by batches of size k = N/nc (nc standing for a number of cores) with burn-in T tpar = max of nc RV ∼ P(−k log p) E [tpar] = T(log p)2 ncδ2  1 + ncδ2 (log p)2 2 log nc + 1 T log 1/p   Séminaire S3 | March 13th 2015 | PAGE 13/26
33. ### Probability estimation Comparison Mean computer time against coecient of variation:

the cost of an algorithm is the number of generated samples in a row by a core. We assume nc ≥ 1 cores and burn-in = T for Metropolis-Hastings Algorithm Time Coef. of var. δ2 Times VS δ Monte Carlo N/nc 1/Np 1/pδ2 AMS T log p log p0 N(1−p0) nc log p log p0 1−p0 Np0 (1−p0)2 p0(log p0)2 T(log p)2 ncδ2 LPA −TN log p −log p N T(log p)2 δ2 Random walk −T N nc log p −log p N T(log p)2 ncδ2 best AMS when p0 → 1 LPA brings the theoretically best AMS but is not parallel Random walk allows for taking p0 → 1 while keeping the parallel implementation Séminaire S3 | March 13th 2015 | PAGE 14/26
34. ### Plan 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 15/26
35. ### Quantile estimation Denition Concept Approximate a time t = −

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ Séminaire S3 | March 13th 2015 | PAGE 16/26
36. ### Quantile estimation Denition Concept Approximate a time t = −

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf Séminaire S3 | March 13th 2015 | PAGE 16/26
37. ### Quantile estimation Denition Concept Approximate a time t = −

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf q = 1 2 (Ym + Ym+1) with m = E[Mq] = −N log p Séminaire S3 | March 13th 2015 | PAGE 16/26
38. ### Quantile estimation Denition Concept Approximate a time t = −

log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf q = 1 2 (Ym + Ym+1) with m = E[Mq] = −N log p CLT: √ N (q − q) L −→ m→∞ N 0, −p2 log p f(q)2 Bounds on bias on O(1/N) Séminaire S3 | March 13th 2015 | PAGE 16/26
39. ### Quantile estimation Example Y ∼ N(0, 1) ; p =

P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 17/26 -4 -2 0 2 4 -0.10 0.00 0.10 y N = 2 ; q = 1.348
40. ### Plan 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 18/26
41. ### Design points Algorithm No need for exact sampling if the

goal is only to get failing samples ⇒ use of a metamodel for conditional sampling Getting Nfail failing samples Sample a minimal-sized DoE Learn a rst metamodel with trend = failure 3: for Nfail times do Simulate the random walks one after the other Sample X1 ∼ µX ; y1 = g(X1); m = 1; train the metamodel while ym < q do 6: Xm+1 = Xm ; ym+1 = ym for T times do Pseudo burn-in X∗ ∼ K(Xm+1, ·); g(X∗) = y∗ K is a kernel for Markov chain sampling 9: If y∗ > ym+1 , ym+1 = y∗ and Xm+1 = X∗ end for ym+1 = g(Xm+1); train the metamodel 12: If ym+1 < ym , Xm+1 = Xm ; ym+1 = ym ; m = m + 1 end while end for Séminaire S3 | March 13th 2015 | PAGE 19/26
42. ### Design points Example Parabolic limit-state function: g : x ∈

R2 −→ 5 − x2 − 0.5(x1 − 0.1)2 Séminaire S3 | March 13th 2015 | PAGE 20/26 −5 0 5 −5 0 5 x y LSF LSF
43. ### Design points Example A two-dimensional four branches serial system: g

: x ∈ R2 −→ min 3 + (x1 − x2)2 10 − | x1 + x2 | √ 2 , 7 √ 2 − | x1 − x2 | Séminaire S3 | March 13th 2015 | PAGE 21/26 −5 0 5 −5 0 5 x y LSF
44. ### Plan 1 Increasing random walk 2 Probability estimation 3 Quantile

estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 22/26
45. ### Conclusion Conclusion One considers Markov chains instead of samples ⇒

N is a number of processes Lets dene parallel estimators for probabilities and quantiles (and moments ) Twins of Monte Carlo estimators with a "log attribute": similar statistical properties but adding a log to the 1/p factor: var [pMC] ≈ p2 Np → var [p] ≈ p2 log 1/p N var [qMC] ≈ p2 Nf(q)2p → var [q] ≈ p2 log 1/p Nf(q)2 Séminaire S3 | March 13th 2015 | PAGE 23/26
46. ### Conclusion Perspectives Adaptation of quantile estimator for optimisation problem (min

or max) Problem of conditional simulations (Metropolis-Hastings) Best use of a metamodel Adaptation for discontinuous RV Séminaire S3 | March 13th 2015 | PAGE 24/26
47. ### Bibliography I S-K Au and J L Beck. Estimation of

small failure probabilities in high dimensions by subset simulation. Probabilistic Engineering Mechanics, 16(4):263277, 2001. Charles-Edouard Bréhier, Ludovic Goudenege, and Loic Tudela. Central limit theorem for adaptative multilevel splitting estimators in an idealized setting. arXiv preprint arXiv:1501.01399, 2015. Charles-Edouard Bréhier, Tony Lelievre, and Mathias Rousset. Analysis of adaptive multilevel splitting algorithms in an idealized case. arXiv preprint arXiv:1405.1352, 2014. F Cérou, P Del Moral, T Furon, and A Guyader. Sequential Monte Carlo for rare event estimation. Statistics and Computing, 22(3):795808, 2012. Séminaire S3 | March 13th 2015 | PAGE 25/26
48. ### Bibliography II A Guyader, N Hengartner, and E Matzner-Løber. Simulation

and estimation of extreme quantiles and extreme probabilities. Applied Mathematics & Optimization, 64(2):171196, 2011. Eric Simonnet. Combinatorial analysis of the adaptive last particle method. Statistics and Computing, pages 120, 2014. Clement Walter. Moving Particles: a parallel optimal Multilevel Splitting method with application in quantiles estimation and meta-model based algorithms. To appear in Structural Safety, 2014. Clement Walter. Point process-based estimation of kth-order moment. arXiv preprint arXiv:1412.6368, 2014. Séminaire S3 | March 13th 2015 | PAGE 26/26
49. ### Merci ! Commissariat à l'énergie atomique et aux énergies alternatives

CEA, DAM, DIF, F-91297 Arpajon, France Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019 CEA DAM DIF