Probabilistic outlier detection and visualization of smart metre data

1 Probabilistic outlier detection and visualization of smart metre data
Rob J Hyndman NZSA/IASC, December 2017

Irish smart metre data Figure: http://solutions.3m.com 500 households from smart
metering trial Electricity consumption at 30-minute intervals between 14 July 2009 and 31 December 2010 Heating/cooling energy usage excluded 2

Irish smart metre data 0 1 2 3 4 5
0 200 400 Days Demand (kWh) Demand for ID: 1550 3

Irish smart metre data 0 2 4 6 0 200
400 Days Demand (kWh) Demand for ID: 1539 4

Quantiles conditional on time of week Compute sample quantiles at
p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 1 2 3 4 5 0 200 400 Days Demand (kWh) Demand for ID: 1550 5

p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 4 5 Demand (kWh) Demand for ID: 1550 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 6

p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 2 4 6 0 200 400 Days Demand (kWh) Demand for ID: 1539 7

p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Demand (kWh) Demand for ID: 1539 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 8

Quantiles conditional on time of week Sample quantiles better than
kernel density estimate: 9

kernel density estimate: presence of zeros 9

kernel density estimate: presence of zeros non-negative support 9

kernel density estimate: presence of zeros non-negative support high skewness 9

kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length 9

kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. 9

kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. Allows clustering of households based on probabilistic behaviour rather than coincident behaviour. 9

Pairwise distances The time series of 535 × 48 observations
per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. 10 −→

Pairwise distances The time series of 535 × 48 observations
per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. Can we compute pairwise distances between all households? 10 −→ ← ? → Distance

Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) =
∞ ∞ p(x) log p(x) q(x) dx 11

∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) 11

∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 11

∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 Distance between two households ∆ij = 7×48 t=1 JS(pt , qt) 11

Kernel matrix and density ranking Similarity between two households wij
= exp(−∆2 ij /h2). 12

Kernel matrix and density ranking Similarity between two households wij
= exp(−∆2 ij /h2). Row sums of the kernel matrix gives a scaled kernel density estimate of households: ˆ fi = n j=1 wij h is bandwidth in Gaussian kernel. Households can be ranked by density values. 12

Most typical household Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 2.5 5.0 7.5 Demand (kWh) Demand for ID: 1672 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 13

Most anomalous household Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.5 1.0 Demand (kWh) Demand for ID: 1881 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.3 0.6 0.9 1.2 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 14

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space
where the distances are preserved “as far as possible”. 15

where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). 15

where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. 15

where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. 15

where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. Then e2 and e3 create an embedding of households in 2d space. 15

Key property of Laplacian embedding Let yi = (e2,i ,
e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. 16

e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. 16

e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. 16

e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. Equivalent to optimal embedding using Laplace-Beltrami operator on manifolds. 16

Outliers shown in embedded space q q q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1881 −4 −2 0 2 −2 −1 0 1 2 Comp1 Comp2 HDRs q q q q 1 50 99 >99 Laplacian embedding (HDRs on original space) 17

Features and limitations Features of approach Converting time series to
quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. 18

quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two diﬀerent uses of bandwidth: density-ranking, embedding. Diﬀerent bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18

quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two diﬀerent uses of bandwidth: density-ranking, embedding. Diﬀerent bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18 robjhyndman.com

Probabilistic outlier detection and visualizati...

Probabilistic outlier detection and visualization of smart metre data

More Decks by Rob J Hyndman

Other Decks in Research

Featured

Transcript