Clément Walter - Speaker Deck

Slide 1

Slide 1 text

Rare event simulation a Point Process interpretation with application in probability and quantile estimation and metamodel based algorithms Séminaire S3 | Clément WALTER March 13th 2015

Slide 2

Slide 2 text

Introduction Problem setting: X random vector with know distribution µX g a "black-box" function representing a computer code: g : Rd → R Y = g(X) the real-valued random variable which describes the state of the system; its distribution µY is unknown Uncertainty Quantication: F = {x ∈ Rd | g(x) > q} nd p = P [X ∈ F] = µX(F) for a given q nd q for a given p Issues p = µX(F) = µY ([q; +∞[) 1 needs to use g to get F or µY which is time costly Monte Carlo estimator has a CV δ2 ≈ 1/Np ⇒ N 1/p Séminaire S3 | March 13th 2015 | PAGE 1/26

Slide 3

Slide 3 text

Introduction Two main directions to overcome this issue: learn a metamodel on g use variance-reduction techniques to estimate p Séminaire S3 | March 13th 2015 | PAGE 2/26

Slide 4

Slide 4 text

Introduction Two main directions to overcome this issue: learn a metamodel on g use variance-reduction techniques to estimate p Séminaire S3 | March 13th 2015 | PAGE 2/26

Slide 5

Slide 5 text

Introduction Two main directions to overcome this issue: learn a metamodel on g use variance-reduction techniques to estimate p Importance sampling Séminaire S3 | March 13th 2015 | PAGE 2/26

Slide 6

Slide 6 text

Introduction Two main directions to overcome this issue: learn a metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Séminaire S3 | March 13th 2015 | PAGE 2/26

Slide 7

Slide 7 text

Introduction Two main directions to overcome this issue: learn a metamodel on g use variance-reduction techniques to estimate p Importance sampling Multilevel splitting Multilevel Splitting (Subset Simulations) Write the sought probability p as a product of less small probabilities and estimate them with MCMC: let (qi)i=0..m be an increasing sequence with q0 = −∞ and qm = q: p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1] = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1] Séminaire S3 | March 13th 2015 | PAGE 2/26

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Introduction A typical MS algorithm works as follows: 1 Sample a Monte-Carlo population (Xi)i of size N; y = (g(X1), · · · , g(XN )); j = 0 2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj] 3 Resample the (Xi)i such that g(Xi) ≤ qj+1 conditionally to be greater than qj+1 (the other ones don't change) 4 j ← j + 1 and repeat until j = m ⇒ Parallel computation at each iteration in the resampling step Minimal variance when all conditional probabilities are equal [4] Adaptive Multilevel Splitting: qj+1 ← y(p0N) or qj+1 ← y(k) empirical quantiles of order p0 ∈ (0, 1) ⇒ bias [4, 1]; the number of subsets converges toward a constant log p/ log p0 empirical quantiles of order k/N ⇒ no bias and CLT [3, 2] minimal variance with k = 1 (Last Particle Algorithm [5, 6]); the number of subsets follows a Poisson law with parameter −N log p ⇒ disables parallel computation Séminaire S3 | March 13th 2015 | PAGE 3/26

Slide 12

Slide 12 text

Outline 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 4/26

Slide 13

Slide 13 text

Outline 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 5/26

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Increasing random walk Denition Denition Let Y be a real-valued random variable with distribution µY and cdf F (assumed to be continuous). One considers the Markov chain (with Y0 = −∞) such that: ∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] = µY (A ∩ (Yn, +∞)) µY ((Yn, +∞)) i.e. Yn+1 is randomly greater than Yn : Yn+1 ∼ µY (· | Y > Yn) the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a Poisson process with parameter 1 [5, 7] Séminaire S3 | March 13th 2015 | PAGE 6/26

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Increasing random walk Denition First consequence: number of simulations to get the realisation of a random variable above a given threshold (event with probability p) follows a Poisson law P(log 1/p) instead of a Geometric law G(p). 0 20 40 60 80 100 0.00 0.05 0.10 0.15 0.20 N Density Figure: Comparison of Poisson and Geometric densities with p = 0.0228 Séminaire S3 | March 13th 2015 | PAGE 7/26

Slide 19

Slide 19 text

Increasing random walk Example Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 8/26 -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 x Density

Slide 20

Slide 20 text

Plan 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 9/26

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Probability estimation Denition of the estimator Concept Number of events before time t = − log(P(Y > q)) follows a Poisson law with parameter t estimate the parameter of a Poisson law Let (Mi)i=1..N be N iid. RV of the number of events at time t = − log p: Mi ∼ P(− log p); Mq = N i=1 Mi ∼ P(−N log p) − log p ≈ 1 N N i=1 Mi = Mq N −→ p = 1 − 1 N Mq ⇒ Last Particle Estimator, but with parallel implementation Séminaire S3 | March 13th 2015 | PAGE 10/26

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Probability estimation Example Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 11/26 -4 -2 0 2 4 0.0 0.4 0.8 y Empirical cdf N = 2 ; p = 0.0078

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Probability estimation Practical implementation Ideally, one knows how to generate the Markov chain In practice, Y = g(X) and one can use Markov chain sampling (e.g. Metropolis-Hastings or Gibbs algorithms) requires to work with a population to get starting points ⇒ batches of k random walks are generated together Generating k random walks Require: k, q Generate k copies (Xi)i=1..k according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0) while min Y < q do 3: ind ← which Y < q for i in ind do Mi = Mi + 1 6: Generate X∗ ∼ µX (· | X > g(Xi)) Xi ← X∗; Yi = g(X∗) end for 9: end while Return M, (Xi)i=1..N , (Yi)i=1..N Séminaire S3 | March 13th 2015 | PAGE 12/26

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Probability estimation Practical implementation Convergence of the Markov chain (for conditional sampling) is increased when the starting point already follows the targeted distribution; 2 possibilities: store each state (Xi)i and its corresponding level re-draw only the smallest Xi (⇒ Last Particle Algorithm) ⇒ LPA is only one possible implementation of this estimator Computing time with LPA implementation Let tpar be the random time of generating N random walks by batches of size k = N/nc (nc standing for a number of cores) with burn-in T tpar = max of nc RV ∼ P(−k log p) E [tpar] = T(log p)2 ncδ2  1 + ncδ2 (log p)2 2 log nc + 1 T log 1/p   Séminaire S3 | March 13th 2015 | PAGE 13/26

Slide 33

Slide 33 text

Probability estimation Comparison Mean computer time against coecient of variation: the cost of an algorithm is the number of generated samples in a row by a core. We assume nc ≥ 1 cores and burn-in = T for Metropolis-Hastings Algorithm Time Coef. of var. δ2 Times VS δ Monte Carlo N/nc 1/Np 1/pδ2 AMS T log p log p0 N(1−p0) nc log p log p0 1−p0 Np0 (1−p0)2 p0(log p0)2 T(log p)2 ncδ2 LPA −TN log p −log p N T(log p)2 δ2 Random walk −T N nc log p −log p N T(log p)2 ncδ2 best AMS when p0 → 1 LPA brings the theoretically best AMS but is not parallel Random walk allows for taking p0 → 1 while keeping the parallel implementation Séminaire S3 | March 13th 2015 | PAGE 14/26

Slide 34

Slide 34 text

Plan 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 15/26

Slide 35

Slide 35 text

Quantile estimation Denition Concept Approximate a time t = − log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ Séminaire S3 | March 13th 2015 | PAGE 16/26

Slide 36

Slide 36 text

Quantile estimation Denition Concept Approximate a time t = − log p with times of a Poisson process; the higher the rate, the denser the discretisation of [0; +∞[ The center of the interval [TMt ; TMt+1] converges toward a random variable centred in t with symmetric pdf Séminaire S3 | March 13th 2015 | PAGE 16/26

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Quantile estimation Example Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78 Séminaire S3 | March 13th 2015 | PAGE 17/26 -4 -2 0 2 4 -0.10 0.00 0.10 y N = 2 ; q = 1.348

Slide 40

Slide 40 text

Plan 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 18/26

Slide 41

Slide 41 text

Design points Algorithm No need for exact sampling if the goal is only to get failing samples ⇒ use of a metamodel for conditional sampling Getting Nfail failing samples Sample a minimal-sized DoE Learn a rst metamodel with trend = failure 3: for Nfail times do Simulate the random walks one after the other Sample X1 ∼ µX ; y1 = g(X1); m = 1; train the metamodel while ym < q do 6: Xm+1 = Xm ; ym+1 = ym for T times do Pseudo burn-in X∗ ∼ K(Xm+1, ·); g(X∗) = y∗ K is a kernel for Markov chain sampling 9: If y∗ > ym+1 , ym+1 = y∗ and Xm+1 = X∗ end for ym+1 = g(Xm+1); train the metamodel 12: If ym+1 < ym , Xm+1 = Xm ; ym+1 = ym ; m = m + 1 end while end for Séminaire S3 | March 13th 2015 | PAGE 19/26

Slide 42

Slide 42 text

Design points Example Parabolic limit-state function: g : x ∈ R2 −→ 5 − x2 − 0.5(x1 − 0.1)2 Séminaire S3 | March 13th 2015 | PAGE 20/26 −5 0 5 −5 0 5 x y LSF LSF

Slide 43

Slide 43 text

Design points Example A two-dimensional four branches serial system: g : x ∈ R2 −→ min 3 + (x1 − x2)2 10 − | x1 + x2 | √ 2 , 7 √ 2 − | x1 − x2 | Séminaire S3 | March 13th 2015 | PAGE 21/26 −5 0 5 −5 0 5 x y LSF

Slide 44

Slide 44 text

Plan 1 Increasing random walk 2 Probability estimation 3 Quantile estimation 4 Design points 5 Conclusion Séminaire S3 | March 13th 2015 | PAGE 22/26

Slide 45

Slide 45 text

Conclusion Conclusion One considers Markov chains instead of samples ⇒ N is a number of processes Lets dene parallel estimators for probabilities and quantiles (and moments [8]) Twins of Monte Carlo estimators with a "log attribute": similar statistical properties but adding a log to the 1/p factor: var [pMC] ≈ p2 Np → var [p] ≈ p2 log 1/p N var [qMC] ≈ p2 Nf(q)2p → var [q] ≈ p2 log 1/p Nf(q)2 Séminaire S3 | March 13th 2015 | PAGE 23/26

Slide 46

Slide 46 text

Conclusion Perspectives Adaptation of quantile estimator for optimisation problem (min or max) Problem of conditional simulations (Metropolis-Hastings) Best use of a metamodel Adaptation for discontinuous RV Séminaire S3 | March 13th 2015 | PAGE 24/26

Slide 47

Slide 47 text

Bibliography I S-K Au and J L Beck. Estimation of small failure probabilities in high dimensions by subset simulation. Probabilistic Engineering Mechanics, 16(4):263277, 2001. Charles-Edouard Bréhier, Ludovic Goudenege, and Loic Tudela. Central limit theorem for adaptative multilevel splitting estimators in an idealized setting. arXiv preprint arXiv:1501.01399, 2015. Charles-Edouard Bréhier, Tony Lelievre, and Mathias Rousset. Analysis of adaptive multilevel splitting algorithms in an idealized case. arXiv preprint arXiv:1405.1352, 2014. F Cérou, P Del Moral, T Furon, and A Guyader. Sequential Monte Carlo for rare event estimation. Statistics and Computing, 22(3):795808, 2012. Séminaire S3 | March 13th 2015 | PAGE 25/26

Slide 48

Slide 48 text

Bibliography II A Guyader, N Hengartner, and E Matzner-Løber. Simulation and estimation of extreme quantiles and extreme probabilities. Applied Mathematics & Optimization, 64(2):171196, 2011. Eric Simonnet. Combinatorial analysis of the adaptive last particle method. Statistics and Computing, pages 120, 2014. Clement Walter. Moving Particles: a parallel optimal Multilevel Splitting method with application in quantiles estimation and meta-model based algorithms. To appear in Structural Safety, 2014. Clement Walter. Point process-based estimation of kth-order moment. arXiv preprint arXiv:1412.6368, 2014. Séminaire S3 | March 13th 2015 | PAGE 26/26

Slide 49

Slide 49 text

Merci ! Commissariat à l'énergie atomique et aux énergies alternatives CEA, DAM, DIF, F-91297 Arpajon, France Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019 CEA DAM DIF