The Mass Volume curve, a performance metric for unsupervised anomaly detection

Anomaly detection Mass Volume curve Extreme regions The Mass Volume
curve, a performance metric for unsupervised anomaly detection Rare Events, Extremes and Machine Learning Workshop May 24th, 2018 Albert Thomas Huawei Technologies - T´ el´ ecom ParisTech - Airbus Group Innovations Joint work with Stephan Cl´ emen¸ con, Alexandre Gramfort, Vincent Feuillard and Anne Sabourin. 1 / 32

Anomaly detection Mass Volume curve Extreme regions Outline 1 Unsupervised
anomaly detection 2 The Mass Volume curve 3 Anomaly detection in extreme regions 2 / 32

Anomaly detection Mass Volume curve Extreme regions Unsupervised anomaly detection
1. Find anomalies in an unlabeled data set X1, . . . , Xn 2. Anomalies are assumed to be rare events 3 / 32

X1, . . . , Xn Rd unlabeled data set iid P density f w.r.t. Lebesgue measure λ anomalies rare events Estimate a density level set 4 / 32

Anomaly detection Mass Volume curve Extreme regions Minimum Volume set
(Polonik, 1997) A density level set is a Minimum Volume set, i.e. a solution of min Ω B pRd qλpΩq such that PpΩq ¥ α 6 4 2 0 2 4 6 6 4 2 0 2 4 6 5 / 32

Anomaly detection Mass Volume curve Extreme regions Minimum volume set
density level set 1. A density level set is always a minimum volume set 2. If f has no ﬂat parts, a minimum volume set is a density level set. (Einmahl and Mason, 1992), (Polonik, 1997), (Nunez-Garcia et al., 2003) 6 / 32

algorithms Common approach for most algorithms 1 Learn a scoring function ˆ s : x Rd ÞÑ R such that the smaller ˆ spxq the more abnormal is x. 2 Threshold ˆ s at an oﬀset q such that p Ωα tx, ˆ spxq ¥ qu is an estimation of the Minimum Volume set with mass α. Ñ density estimation (Cadre et al., 2013), One-Class SVM/Support Vector Data Description (Sch¨ olkopf et al., 2001), (Vert and Vert, 2006), (Tax et al., 2004), k-NN (Sricharan & Hero, 2011) Ñ Isolation Forest (Liu et al., 2008) and Local Outlier Factor (Breunig et al., 2000) 7 / 32

Anomaly detection Mass Volume curve Extreme regions Ideal scoring functions
Ideal scoring functions s preserve the order induced by density f spx1 q spx2 q ðñ f px1 q f px2 q i.e. strictly increasing transform of f . 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 5 density f f(x − 0.05) f(x) + 2 s does not need to be close to f in the sense of a Lp norm 8 / 32

Anomaly detection Mass Volume curve Extreme regions Scoring function of
the One-Class SVM 0.0 0.2 Gaussian mixture density f 0 5 10 0 20 One-Class SVM scoring function s Asymptotically constant near the modes and proportional to the density in the low density regions (Vert and Vert, 2006) 9 / 32

Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM
with Gaussian kernel kσ The user needs to choose the bandwidth σ of the kernel σ 0.5 10 5 0 5 5 0 5 Estimated True True Overﬁtting σ 10 10 5 0 5 5 0 5 Estimated True True Underﬁtting How to automatically choose σ? 10 / 32

Anomaly detection Mass Volume curve Extreme regions Problem Given a
data set Sn pX1, . . . , Xn q, a hyperparameter space Θ and an unsupervised anomaly detection algorithm A : Sn ¢Θ Ñ RRd pSn, θq ÞÑ ˆ sθ How to assess the performance of ˆ sθ without a labeled data set? (Anomaly Detection Workshop, Thomas, Cl´ emen¸ con, Feuillard, Gramfort, ICML 2016) 11 / 32

Anomaly detection Mass Volume curve Extreme regions Mass Volume curve
X P, scoring function s : Rd ÝÑ R, t-level set of s: tx, spxq ¥ tu αs ptq P pspXq ¥ tq mass of the t-level set λs ptq λptx, spxq ¥ tuq volume of t-the level set. 12 / 32

Mass Volume curve MVs of a scoring function s (Cl´ emen¸ con and Jakubowicz, 2013), (Cl´ emen¸ con and Thomas, 2017) t R ÞÑ pαs ptq, λs ptqq 4 3 2 1 0 1 2 3 4 x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Score f s Scoring functions 0.0 0.2 0.4 0.6 0.8 1.0 Mass 0 1 2 3 4 5 6 7 8 Volume MVf MVs Mass Volume curves 13 / 32

Easier to work with the following definition: MVs is defined as the plot of the function MVs : α p0, 1q ÞÑ λs pα ¡1 s pαqq λptx, spxq ¥ α ¡1 s pαquq where α ¡1 s generalized inverse of αs. Property (Cl´ emen¸ con and Jakubowicz, 2013), (Cl´ emen¸ con and Thomas, 2017) Assume that the underlying density f has no flat parts, then for all scoring functions s, dα p0, 1q, MV ¦pαq def MVf pαq ¤ MVs pαq The lower is MVs the better is s 14 / 32

Anomaly detection Mass Volume curve Extreme regions Learning hyperparameters Consider
ˆ sθ ApSn, θq Choose θ ¦ minimizing area under MVˆ sθ As MVˆ sθ depends on P, use empirical MV curve estimated on a test set y MVs : α r0, 1q ÞÑ λs pp α ¡1 s pαqq where p αs ptq 1 n ° n i 1 1tx,s px q¥t upXi q. 15 / 32

Anomaly detection Mass Volume curve Extreme regions Algorithm Choose θ
X minimizing area under Mass Volume curve AMVˆ sθ . 16 / 32

Anomaly detection Mass Volume curve Extreme regions Aggregation For each
random split we may obtain a diﬀerent θ X Ñ to reduce the variance of the estimator consider B random splits of the data set for each random split b we get θ X b and ˆ sb θ: b ﬁnal scoring function p S 1 B B ¸ b 1 ˆ sb θ: b 17 / 32

Anomaly detection Mass Volume curve Extreme regions Toy example Minimum
Volume set estimation with One-Class SVM and B 50 (α 0.95) 10 5 0 5 5 0 5 Estimated Estimated True True 18 / 32

Anomaly detection Mass Volume curve Extreme regions Experiments Given an
anomaly detection algorithm A, compare our approach ÝÑ stuned a priori fixed hyperparameters ÝÑ sfixed Performance criterion: relative gain GA pstuned, sfixed q AMVI psfixed q¡AMVI pstuned q AMVI psfixed q where AMVI is area under MV curve over interval I r0.9, 0.99s, computed on left out data. If GA pstuned, sfixed q ¡ 0 then stuned better than sfixed 19 / 32

Anomaly detection Mass Volume curve Extreme regions Results For stuned
we consider 50 random splits (80/20) GM, d=2 GM, d=4 Banana HTTP 20 0 20 40 60 80 100 Relative gain (%) aKLPE KLPE OCSVM KS iForest n 100 GM, d=2 GM, d=4 Banana HTTP 20 0 20 40 60 80 100 Relative gain (%) aKLPE KLPE OCSVM KS iForest n 500 KLPE (Sricharan & Hero, 2011) — aKLPE: Average KLPE (Qian & Saligrama, 2012) — OCSVM: One-Class SVM (Sch¨ olkopf et al., 2001) — iForest: Isolation Forest (Liu et al., 2008) — KS: Kernel Smoothing 20 / 32

Anomaly detection Mass Volume curve Extreme regions Consistency of y
MVs We used y MVs as an estimate of MVs where dα r0, 1q, y MVs pαq λtx, spxq ¥ p α ¡1 s pαqu with p α ¡1 s the generalized inverse of p αs. 2 questions Consistency of y MVs as n Ñ V? How to build conﬁdence regions? 21 / 32

Anomaly detection Mass Volume curve Extreme regions Consistency Let s
be a scoring function and ε p0, 1s . Assumptions on the random variable spXq and its distribution Fs λs is C2 Theorem (Cl´ emen¸ con and Thomas, 2017) (i) Consistency. With probability one, sup α r0,1 ¡ε s |y MVs pαq¡MVs pαq| ÝÑ n Ñ V 0 (ii) Functional CLT. There exists a sequence of Brownian bridges tBn pαqu α r0,1 s such that, almost-surely, uniformly over r0, 1 ¡εs, as n Ñ V, c n ¡ y MVs pαq¡MVs pαq © λ I s pα ¡1 s pαqq fs pα ¡1 s pαqq Bn pαq Opn ¡1 {2 log nq 22 / 32

Anomaly detection Mass Volume curve Extreme regions Conﬁdence bands using
smoothed bootstrap 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 Mass 0 10 20 30 40 Volume MVf 90% confidence band Up to log n factors (Cl´ emen¸ con and Thomas, 2017), Functional CLT: rate in Opn ¡1 {2q but requires knowledge of fs Naive (non-smoothed) bootstrap: rate in Opn ¡1 {4q Smoothed bootstrap: rate in Opn ¡2 {5q 23 / 32

Anomaly detection Mass Volume curve Extreme regions Anomaly detection in
extreme regions Goal of multivariate Extreme Value Theory: model the tail of the distribution F of a multivariate random variable X pX p1 q , . . . , X pd qq Motivation for unsupervised anomaly detection Anomalies are likely to be located in extreme regions, i.e. regions far from the mean E rXs Lack of data in these regions makes it diﬃcult to distinguish between large normal instances and anomalies Relying on multivariate Extreme Value Theory we suggest an algorithm to detect anomalies in extreme regions. 24 / 32

Anomaly detection Mass Volume curve Extreme regions Multivariate Extreme Value
Theory Common approach Standardization to unit Pareto: TpXq V Rd where V pj q 1 1 ¡Fj pX pj qq, dj t1, . . . , du, Fj , 1 ¤ j ¤ d, being the margins. Note that X and V share the same dependence structure/copula. 25 / 32

Anomaly detection Mass Volume curve Extreme regions Multivariate Extreme Value
Theory prpvq, ϕpvqq p}v}V, v{}v}Vq polar coordinates Sd ¡1 positive orthant of the unit hypercube Theorem (Resnick, 1987) With mild assumptions on the distribution F, there exists a ﬁnite (angular) measure Φ on Sd ¡1 such that for all Ω Sd ¡1, Φt pΩq def t ¤P prpVq ¡ t, ϕpVq Ωq ÝÑ t ÑV ΦpΩq 26 / 32

Anomaly detection Mass Volume curve Extreme regions Main idea Anomalies
in extreme regions: observations that deviate from the dependence structure of the tail Dependence Independence To ﬁnd the most likely directions of the extreme observations we estimate a Minimum Volume set of the angular measure Φ 27 / 32

Anomaly detection Mass Volume curve Extreme regions Minimum volume set
estimation on the sphere Solve empirical optimization problem (Scott and Nowak, 2006) min Ω G λd pΩq subject to p Φn,k pΩq ¥ α ¡ψk pδq where p Φn,k is estimated from t ¤P prpVq ¡ t, ϕpVq Ωq with t n{k: p Φn,k pΩq 1 k n ¸ i 1 1tr pV i q¥n k , ϕ pV i qΩ u n/k n/k 28 / 32

Anomaly detection Mass Volume curve Extreme regions Theorem Ω ¦
α being the true Minimum Volume set, RpΩq pλd pΩq¡λd pΩ ¦ α qq α ¡Φn {k pΩq¨ Theorem (Thomas, Cl´ emen¸ con, Gramfort, Sabourin, 2017) Assume that the margins Fj are known the class G is of ﬁnite VC dimension common assumptions for existence and uniqueness of MV sets are fulﬁlled by Φn {k Then there exists a constant C ¡ 0 such that E rRpp Ωα qs ¤ ¡ inf Ω Gα λd pΩq¡λd pΩ ¦ α q© C log k k where Gα tΩ G, Φn {k pΩq ¥ αu. 29 / 32

Anomaly detection Mass Volume curve Extreme regions Numerical experiments Global
scoring function ˆ sprpvq, ϕpvqq 1{rpvq2 ¤ ˆ sϕ pϕpvqq. Data set OCSVM Isolation Forest Score ˆ s shuttle 0.981 0.963 0.987 SF 0.478 0.251 0.660 http 0.997 0.662 0.964 ann 0.372 0.610 0.518 forestcover 0.540 0.516 0.646 ROC-AUC are computed on test sets made of normal and abnormal instances and restricted to the extreme region (k c n). 30 / 32

Anomaly detection Mass Volume curve Extreme regions References Mass Volume
curves and anomaly ranking. S. Cl´ emen¸ con, A. Thomas. Submitted. 2017 Learning hyperparameters for unsupervised anomaly detection. A. Thomas, S. Cl´ emen¸ con, V. Feuillard, A. Gramfort. Anomaly detection Workshop, ICML 2016. Anomaly detection in extreme regions via empirical MV sets on the sphere. A. Thomas, S. Cl´ emen¸ con, A. Gramfort, A. Sabourin. AISTATS 2017. Code for hyperparameter tuning available on github https://github.com/albertcthomas/anomaly_tuning 31 / 32

Anomaly detection Mass Volume curve Extreme regions Thank you 32
/ 32

The Mass Volume curve, a performance metric for...

The Mass Volume curve, a performance metric for unsupervised anomaly detection

Albert

More Decks by Albert

Other Decks in Research

Featured

Transcript

Anomaly detection Mass Volume curve Extreme regions The Mass Volume

Anomaly detection Mass Volume curve Extreme regions Outline 1 Unsupervised

Anomaly detection Mass Volume curve Extreme regions Unsupervised anomaly detection

Anomaly detection Mass Volume curve Extreme regions Unsupervised anomaly detection

Anomaly detection Mass Volume curve Extreme regions Minimum Volume set

Anomaly detection Mass Volume curve Extreme regions Minimum volume set

Anomaly detection Mass Volume curve Extreme regions Unsupervised anomaly detection

Anomaly detection Mass Volume curve Extreme regions Ideal scoring functions

Anomaly detection Mass Volume curve Extreme regions Scoring function of

Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM

Anomaly detection Mass Volume curve Extreme regions Problem Given a

Anomaly detection Mass Volume curve Extreme regions Mass Volume curve

Anomaly detection Mass Volume curve Extreme regions Mass Volume curve

Anomaly detection Mass Volume curve Extreme regions Mass Volume curve

Anomaly detection Mass Volume curve Extreme regions Learning hyperparameters Consider

Anomaly detection Mass Volume curve Extreme regions Algorithm Choose θ

Anomaly detection Mass Volume curve Extreme regions Aggregation For each

Anomaly detection Mass Volume curve Extreme regions Toy example Minimum

Anomaly detection Mass Volume curve Extreme regions Experiments Given an

Anomaly detection Mass Volume curve Extreme regions Results For stuned

Anomaly detection Mass Volume curve Extreme regions Consistency of y

Anomaly detection Mass Volume curve Extreme regions Consistency Let s

Anomaly detection Mass Volume curve Extreme regions Conﬁdence bands using

Anomaly detection Mass Volume curve Extreme regions Anomaly detection in

Anomaly detection Mass Volume curve Extreme regions Multivariate Extreme Value

Anomaly detection Mass Volume curve Extreme regions Multivariate Extreme Value

Anomaly detection Mass Volume curve Extreme regions Main idea Anomalies

Anomaly detection Mass Volume curve Extreme regions Minimum volume set

Anomaly detection Mass Volume curve Extreme regions Theorem Ω ¦

Anomaly detection Mass Volume curve Extreme regions Numerical experiments Global

Anomaly detection Mass Volume curve Extreme regions References Mass Volume

Anomaly detection Mass Volume curve Extreme regions Thank you 32