21

# Clément Walter

(CEA, DAM, DIF and Université Paris Diderot, Paris, FR)

https://s3-seminar.github.io/seminars/clement-walter

Title — Rare event simulation: a Point Process interpretation with application in probability and quantile estimation

Abstract — Clément Walter, 25, graduated from Mines ParisTech in 2013. Beforehand he attended preparatory classes in Lycée Sainte-Geneviève (branch Maths and Physics). For his master degree he specialised in Geostatistics and started working with CEA as an intern on emulation of complex computer codes (especially kriging) for rare event simulation and estimation. He has then pursued his work in a PhD under the direction of Pr. Josselin Garnier, focusing on multilevel splitting methods. March 13, 2015

## Transcript

1. Rare event simulation
a Point Process interpretation with
application in probability and quantile
estimation and metamodel based
algorithms
Séminaire S3 | Clément WALTER
March 13th 2015

2. Introduction
Problem setting:
X random vector with know distribution µX
g a "black-box" function representing a computer code: g : Rd → R
Y = g(X) the real-valued random variable which describes the state of
the system; its distribution µY is unknown
Uncertainty Quantication: F = {x ∈ Rd | g(x) > q}
nd p = P [X ∈ F] = µX(F) for a given q
nd q for a given p
Issues
p = µX(F) = µY ([q; +∞[) 1
needs to use g to get F or µY which is time costly
Monte Carlo estimator has a CV δ2 ≈ 1/Np ⇒ N 1/p
Séminaire S3 | March 13th 2015 | PAGE 1/26

3. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Séminaire S3 | March 13th 2015 | PAGE 2/26

4. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Séminaire S3 | March 13th 2015 | PAGE 2/26

5. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Importance sampling
Séminaire S3 | March 13th 2015 | PAGE 2/26

6. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Importance sampling
Multilevel splitting
Séminaire S3 | March 13th 2015 | PAGE 2/26

7. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Importance sampling
Multilevel splitting
Multilevel Splitting (Subset Simulations)
Write the sought probability p as a product of less small probabilities and
estimate them with MCMC: let (qi)i=0..m
be an increasing sequence with
q0 = −∞ and qm = q:
p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1]
= P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1]
Séminaire S3 | March 13th 2015 | PAGE 2/26

8. Introduction
Two main directions to overcome this issue:
learn a metamodel on g
use variance-reduction techniques to estimate p
Importance sampling
Multilevel splitting
Multilevel Splitting (Subset Simulations)
Write the sought probability p as a product of less small probabilities and
estimate them with MCMC: let (qi)i=0..m
be an increasing sequence with
q0 = −∞ and qm = q:
p = P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × P [g(X) > qm−1]
= P [g(X) > q] = P [g(X) > qm | g(X) > qm−1] × · · · × P [g(X) > q1]
⇒ how to choose (qi)i
: beforehand or on-the-go? Optimality? Bias?
Séminaire S3 | March 13th 2015 | PAGE 2/26

9. Introduction
A typical MS algorithm works as follows:
1 Sample a Monte-Carlo population (Xi)i
of size N;
y = (g(X1), · · · , g(XN )); j = 0
2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj]
3 Resample the (Xi)i
such that g(Xi) ≤ qj+1
conditionally to be
greater than qj+1
(the other ones don't change)
4 j ← j + 1 and repeat until j = m
Séminaire S3 | March 13th 2015 | PAGE 3/26

10. Introduction
A typical MS algorithm works as follows:
1 Sample a Monte-Carlo population (Xi)i
of size N;
y = (g(X1), · · · , g(XN )); j = 0
2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj]
3 Resample the (Xi)i
such that g(Xi) ≤ qj+1
conditionally to be
greater than qj+1
(the other ones don't change)
4 j ← j + 1 and repeat until j = m
⇒ Parallel computation at each iteration in the resampling step
Séminaire S3 | March 13th 2015 | PAGE 3/26

11. Introduction
A typical MS algorithm works as follows:
1 Sample a Monte-Carlo population (Xi)i
of size N;
y = (g(X1), · · · , g(XN )); j = 0
2 Estimate the conditional probability P [g(X) > qj+1 | g(X) > qj]
3 Resample the (Xi)i
such that g(Xi) ≤ qj+1
conditionally to be
greater than qj+1
(the other ones don't change)
4 j ← j + 1 and repeat until j = m
⇒ Parallel computation at each iteration in the resampling step
Minimal variance when all conditional probabilities are equal 
Adaptive Multilevel Splitting: qj+1 ← y(p0N)
or qj+1 ← y(k)
empirical quantiles of order p0
∈ (0, 1) ⇒ bias [4, 1]; the number of
subsets converges toward a constant log p/ log p0
empirical quantiles of order k/N ⇒ no bias and CLT [3, 2]
minimal variance with k = 1 (Last Particle Algorithm [5, 6]); the
number of subsets follows a Poisson law with parameter −N log p
⇒ disables parallel computation
Séminaire S3 | March 13th 2015 | PAGE 3/26

12. Outline
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 4/26

13. Outline
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 5/26

14. Increasing random walk
Denition
Denition
Let Y be a real-valued random variable with distribution µY and cdf F
(assumed to be continuous). One considers the Markov chain (with
Y0 = −∞) such that:
∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =
µY (A ∩ (Yn, +∞))
µY ((Yn, +∞))
i.e. Yn+1
is randomly greater than Yn
: Yn+1 ∼ µY (· | Y > Yn)
Séminaire S3 | March 13th 2015 | PAGE 6/26

15. Increasing random walk
Denition
Denition
Let Y be a real-valued random variable with distribution µY and cdf F
(assumed to be continuous). One considers the Markov chain (with
Y0 = −∞) such that:
∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =
µY (A ∩ (Yn, +∞))
µY ((Yn, +∞))
i.e. Yn+1
is randomly greater than Yn
: Yn+1 ∼ µY (· | Y > Yn)
the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a
Poisson process with parameter 1 [5, 7]
Séminaire S3 | March 13th 2015 | PAGE 6/26

16. Increasing random walk
Denition
Denition
Let Y be a real-valued random variable with distribution µY and cdf F
(assumed to be continuous). One considers the Markov chain (with
Y0 = −∞) such that:
∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =
µY (A ∩ (Yn, +∞))
µY ((Yn, +∞))
i.e. Yn+1
is randomly greater than Yn
: Yn+1 ∼ µY (· | Y > Yn)
the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a
Poisson process with parameter 1 [5, 7]
Time in the Poisson process is linked with rarity in probability
Séminaire S3 | March 13th 2015 | PAGE 6/26

17. Increasing random walk
Denition
Denition
Let Y be a real-valued random variable with distribution µY and cdf F
(assumed to be continuous). One considers the Markov chain (with
Y0 = −∞) such that:
∀n ∈ N : P [Yn+1 ∈ A | Y0, · · · , Yn] =
µY (A ∩ (Yn, +∞))
µY ((Yn, +∞))
i.e. Yn+1
is randomly greater than Yn
: Yn+1 ∼ µY (· | Y > Yn)
the Tn = − log(P(Y > Yn)) are distributed as the arrival times of a
Poisson process with parameter 1 [5, 7]
Time in the Poisson process is linked with rarity in probability
The number of events My
before y ∈ R is related to P [Y > y] = py
My = Mt=− log py
L
∼ P(− log py)
Séminaire S3 | March 13th 2015 | PAGE 6/26

18. Increasing random walk
Denition
First consequence: number of simulations to get the realisation of a
random variable above a given threshold (event with probability p) follows
a Poisson law P(log 1/p) instead of a Geometric law G(p).
0 20 40 60 80 100
0.00 0.05 0.10 0.15 0.20
N
Density
Figure: Comparison of
Poisson and Geometric
densities with
p = 0.0228
Séminaire S3 | March 13th 2015 | PAGE 7/26

19. Increasing random walk
Example
Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78
Séminaire S3 | March 13th 2015 | PAGE 8/26
-4 -2 0 2 4
0.0 0.1 0.2 0.3 0.4
x
Density

20. Plan
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 9/26

21. Probability estimation
Denition of the estimator
Concept
Number of events before time t = − log(P(Y > q)) follows a Poisson law
with parameter t estimate the parameter of a Poisson law
Séminaire S3 | March 13th 2015 | PAGE 10/26

22. Probability estimation
Denition of the estimator
Concept
Number of events before time t = − log(P(Y > q)) follows a Poisson law
with parameter t estimate the parameter of a Poisson law
Let (Mi)i=1..N
be N iid. RV of the number of events at time
t = − log p: Mi ∼ P(− log p); Mq =
N
i=1
Mi ∼ P(−N log p)
Séminaire S3 | March 13th 2015 | PAGE 10/26

23. Probability estimation
Denition of the estimator
Concept
Number of events before time t = − log(P(Y > q)) follows a Poisson law
with parameter t estimate the parameter of a Poisson law
Let (Mi)i=1..N
be N iid. RV of the number of events at time
t = − log p: Mi ∼ P(− log p); Mq =
N
i=1
Mi ∼ P(−N log p)
− log p ≈
1
N
N
i=1
Mi =
Mq
N
−→ p = 1 −
1
N
Mq
Séminaire S3 | March 13th 2015 | PAGE 10/26

24. Probability estimation
Denition of the estimator
Concept
Number of events before time t = − log(P(Y > q)) follows a Poisson law
with parameter t estimate the parameter of a Poisson law
Let (Mi)i=1..N
be N iid. RV of the number of events at time
t = − log p: Mi ∼ P(− log p); Mq =
N
i=1
Mi ∼ P(−N log p)
− log p ≈
1
N
N
i=1
Mi =
Mq
N
−→ p = 1 −
1
N
Mq
⇒ Last Particle Estimator, but with parallel implementation
Séminaire S3 | March 13th 2015 | PAGE 10/26

25. Probability estimation
Denition of the estimator
Concept
Number of events before time t = − log(P(Y > q)) follows a Poisson law
with parameter t estimate the parameter of a Poisson law
Let (Mi)i=1..N
be N iid. RV of the number of events at time
t = − log p: Mi ∼ P(− log p); Mq =
N
i=1
Mi ∼ P(−N log p)
− log p ≈
1
N
N
i=1
Mi =
Mq
N
−→ p = 1 −
1
N
Mq
⇒ Last Particle Estimator, but with parallel implementation
⇒ One has indeed an estimator of P [Y > q0] , ∀q0 ≤ q:
FN (y) = 1 − 1 −
1
N
My
a.s.

N→∞
F(y)
Séminaire S3 | March 13th 2015 | PAGE 10/26

26. Probability estimation
Example
Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78
Séminaire S3 | March 13th 2015 | PAGE 11/26
-4 -2 0 2 4
0.0 0.4 0.8
y
Empirical cdf
N = 2 ; p = 0.0078

27. Probability estimation
Practical implementation
Ideally, one knows how to generate the Markov chain
In practice, Y = g(X) and one can use Markov chain sampling (e.g.
Metropolis-Hastings or Gibbs algorithms)
requires to work with a population to get starting points
Séminaire S3 | March 13th 2015 | PAGE 12/26

28. Probability estimation
Practical implementation
Ideally, one knows how to generate the Markov chain
In practice, Y = g(X) and one can use Markov chain sampling (e.g.
Metropolis-Hastings or Gibbs algorithms)
requires to work with a population to get starting points
⇒ batches of k random walks are generated together
Séminaire S3 | March 13th 2015 | PAGE 12/26

29. Probability estimation
Practical implementation
Ideally, one knows how to generate the Markov chain
In practice, Y = g(X) and one can use Markov chain sampling (e.g.
Metropolis-Hastings or Gibbs algorithms)
requires to work with a population to get starting points
⇒ batches of k random walks are generated together
Generating k random walks
Require: k, q
Generate k copies (Xi)i=1..k
according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0)
while min Y < q do
3: ind ← which Y < q
for
i in ind do
Mi = Mi + 1
6: Generate X∗ ∼ µX (· | X > g(Xi))
Xi ← X∗; Yi = g(X∗)
end for
9: end while
Return M, (Xi)i=1..N
, (Yi)i=1..N
Séminaire S3 | March 13th 2015 | PAGE 12/26

30. Probability estimation
Practical implementation
Ideally, one knows how to generate the Markov chain
In practice, Y = g(X) and one can use Markov chain sampling (e.g.
Metropolis-Hastings or Gibbs algorithms)
requires to work with a population to get starting points
⇒ batches of k random walks are generated together
Generating k random walks
Require: k, q
Generate k copies (Xi)i=1..k
according to µX ; Y ← (g(X1), · · · , g(Xk)); M = (0, · · · , 0)
while min Y < q do
3: ind ← which Y < q
for
i in ind do
Mi = Mi + 1
6: Generate X∗ ∼ µX (· | X > g(Xi))
Xi ← X∗; Yi = g(X∗)
end for
9: end while
Return M, (Xi)i=1..N
, (Yi)i=1..N
⇒ each sample is resampled according to its own level
Séminaire S3 | March 13th 2015 | PAGE 12/26

31. Probability estimation
Practical implementation
Convergence of the Markov chain (for conditional sampling) is increased
when the starting point already follows the targeted distribution; 2
possibilities:
store each state (Xi)i
and its corresponding level
re-draw only the smallest Xi
(⇒ Last Particle Algorithm)
⇒ LPA is only one possible implementation of this estimator
Séminaire S3 | March 13th 2015 | PAGE 13/26

32. Probability estimation
Practical implementation
Convergence of the Markov chain (for conditional sampling) is increased
when the starting point already follows the targeted distribution; 2
possibilities:
store each state (Xi)i
and its corresponding level
re-draw only the smallest Xi
(⇒ Last Particle Algorithm)
⇒ LPA is only one possible implementation of this estimator
Computing time with LPA implementation
Let tpar be the random time of generating N random walks by batches of
size k = N/nc
(nc
standing for a number of cores) with burn-in T
tpar = max of nc
RV ∼ P(−k log p)
E [tpar] =
T(log p)2
ncδ2

1 +
ncδ2
(log p)2
2 log nc +
1
T log 1/p

Séminaire S3 | March 13th 2015 | PAGE 13/26

33. Probability estimation
Comparison
Mean computer time against coecient of variation: the cost of an
algorithm is the number of generated samples in a row by a core. We
assume nc ≥ 1 cores and burn-in = T for Metropolis-Hastings
Algorithm Time Coef. of var. δ2 Times VS δ
Monte Carlo N/nc 1/Np 1/pδ2
AMS T log p
log p0
N(1−p0)
nc
log p
log p0
1−p0
Np0
(1−p0)2
p0(log p0)2
T(log p)2
ncδ2
LPA −TN log p −log p
N
T(log p)2
δ2
Random walk −T N
nc
log p −log p
N
T(log p)2
ncδ2
best AMS when p0 → 1
LPA brings the theoretically best AMS but is not parallel
Random walk allows for taking p0 → 1 while keeping the parallel
implementation
Séminaire S3 | March 13th 2015 | PAGE 14/26

34. Plan
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 15/26

35. Quantile estimation
Denition
Concept
Approximate a time t = − log p with times of a Poisson process; the higher
the rate, the denser the discretisation of [0; +∞[
Séminaire S3 | March 13th 2015 | PAGE 16/26

36. Quantile estimation
Denition
Concept
Approximate a time t = − log p with times of a Poisson process; the higher
the rate, the denser the discretisation of [0; +∞[
The center of the interval [TMt
; TMt+1] converges toward a random
variable centred in t with symmetric pdf
Séminaire S3 | March 13th 2015 | PAGE 16/26

37. Quantile estimation
Denition
Concept
Approximate a time t = − log p with times of a Poisson process; the higher
the rate, the denser the discretisation of [0; +∞[
The center of the interval [TMt
; TMt+1] converges toward a random
variable centred in t with symmetric pdf
q =
1
2
(Ym + Ym+1) with m = E[Mq] = −N log p
Séminaire S3 | March 13th 2015 | PAGE 16/26

38. Quantile estimation
Denition
Concept
Approximate a time t = − log p with times of a Poisson process; the higher
the rate, the denser the discretisation of [0; +∞[
The center of the interval [TMt
; TMt+1] converges toward a random
variable centred in t with symmetric pdf
q =
1
2
(Ym + Ym+1) with m = E[Mq] = −N log p
CLT:

N (q − q) L
−→
m→∞
N 0,
−p2 log p
f(q)2
Bounds on bias on O(1/N)
Séminaire S3 | March 13th 2015 | PAGE 16/26

39. Quantile estimation
Example
Y ∼ N(0, 1) ; p = P [Y > 2] ≈ 2, 28.10−2 ; 1/p ≈ 43, 96 ; − log p ≈ 3, 78
Séminaire S3 | March 13th 2015 | PAGE 17/26
-4 -2 0 2 4
-0.10 0.00 0.10
y
N = 2 ; q = 1.348

40. Plan
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 18/26

41. Design points
Algorithm
No need for exact sampling if the goal is only to get failing samples
⇒ use of a metamodel for conditional sampling
Getting Nfail failing samples
Sample a minimal-sized DoE
Learn a rst metamodel with trend = failure
3: for Nfail
times do
Simulate the random walks one after the other
Sample X1 ∼ µX ; y1 = g(X1); m = 1; train the metamodel
while ym < q do
6: Xm+1 = Xm
; ym+1 = ym
for T times do
Pseudo burn-in
X∗ ∼ K(Xm+1, ·); g(X∗) = y∗ K is a kernel for Markov chain sampling
9: If y∗ > ym+1
, ym+1 = y∗ and Xm+1 = X∗
end for
ym+1 = g(Xm+1); train the metamodel
12: If ym+1 < ym
, Xm+1 = Xm
; ym+1 = ym
; m = m + 1
end while
end for
Séminaire S3 | March 13th 2015 | PAGE 19/26

42. Design points
Example
Parabolic limit-state function: g : x ∈ R2 −→ 5 − x2 − 0.5(x1 − 0.1)2
Séminaire S3 | March 13th 2015 | PAGE 20/26
−5 0 5
−5 0 5
x
y
LSF
LSF

43. Design points
Example
A two-dimensional four branches serial system:
g : x ∈ R2 −→ min 3 +
(x1 − x2)2
10

| x1 + x2 |

2
,
7

2
− | x1 − x2 |
Séminaire S3 | March 13th 2015 | PAGE 21/26
−5 0 5
−5 0 5
x
y
LSF

44. Plan
1 Increasing random walk
2 Probability estimation
3 Quantile estimation
4 Design points
5 Conclusion
Séminaire S3 | March 13th 2015 | PAGE 22/26

45. Conclusion
Conclusion
One considers Markov chains instead of samples
⇒ N is a number of processes
Lets dene parallel estimators for probabilities and quantiles (and
moments )
Twins of Monte Carlo estimators with a "log attribute": similar
statistical properties but adding a log to the 1/p factor:
var [pMC] ≈
p2
Np
→ var [p] ≈
p2 log 1/p
N
var [qMC] ≈
p2
Nf(q)2p
→ var [q] ≈
p2 log 1/p
Nf(q)2
Séminaire S3 | March 13th 2015 | PAGE 23/26

46. Conclusion
Perspectives
Adaptation of quantile estimator for optimisation problem (min or
max)
Problem of conditional simulations (Metropolis-Hastings)
Best use of a metamodel
Séminaire S3 | March 13th 2015 | PAGE 24/26

47. Bibliography I
S-K Au and J L Beck.
Estimation of small failure probabilities in high dimensions by subset
simulation.
Probabilistic Engineering Mechanics, 16(4):263277, 2001.
Charles-Edouard Bréhier, Ludovic Goudenege, and Loic Tudela.
Central limit theorem for adaptative multilevel splitting estimators in
an idealized setting.
arXiv preprint arXiv:1501.01399, 2015.
Charles-Edouard Bréhier, Tony Lelievre, and Mathias Rousset.
Analysis of adaptive multilevel splitting algorithms in an idealized case.
arXiv preprint arXiv:1405.1352, 2014.
F Cérou, P Del Moral, T Furon, and A Guyader.
Sequential Monte Carlo for rare event estimation.
Statistics and Computing, 22(3):795808, 2012.
Séminaire S3 | March 13th 2015 | PAGE 25/26

48. Bibliography II
A Guyader, N Hengartner, and E Matzner-Løber.
Simulation and estimation of extreme quantiles and extreme
probabilities.
Applied Mathematics & Optimization, 64(2):171196, 2011.
Eric Simonnet.
Combinatorial analysis of the adaptive last particle method.
Statistics and Computing, pages 120, 2014.
Clement Walter.
Moving Particles: a parallel optimal Multilevel Splitting method with
application in quantiles estimation and meta-model based algorithms.
To appear in Structural Safety, 2014.
Clement Walter.
Point process-based estimation of kth-order moment.
arXiv preprint arXiv:1412.6368, 2014.
Séminaire S3 | March 13th 2015 | PAGE 26/26

49. Merci !
Commissariat à l'énergie atomique et aux énergies alternatives
CEA, DAM, DIF, F-91297 Arpajon, France
Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019
CEA
DAM
DIF