Slide 1

Slide 1 text

1 Probabilistic outlier detection and visualization of smart metre data Rob J Hyndman NZSA/IASC, December 2017

Slide 2

Slide 2 text

Irish smart metre data Figure: http://solutions.3m.com 500 households from smart metering trial Electricity consumption at 30-minute intervals between 14 July 2009 and 31 December 2010 Heating/cooling energy usage excluded 2

Slide 3

Slide 3 text

Irish smart metre data 0 1 2 3 4 5 0 200 400 Days Demand (kWh) Demand for ID: 1550 3

Slide 4

Slide 4 text

Irish smart metre data 0 2 4 6 0 200 400 Days Demand (kWh) Demand for ID: 1539 4

Slide 5

Slide 5 text

Quantiles conditional on time of week Compute sample quantiles at p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 1 2 3 4 5 0 200 400 Days Demand (kWh) Demand for ID: 1550 5

Slide 6

Slide 6 text

Quantiles conditional on time of week Compute sample quantiles at p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 4 5 Demand (kWh) Demand for ID: 1550 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 6

Slide 7

Slide 7 text

Quantiles conditional on time of week Compute sample quantiles at p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 2 4 6 0 200 400 Days Demand (kWh) Demand for ID: 1539 7

Slide 8

Slide 8 text

Quantiles conditional on time of week Compute sample quantiles at p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Demand (kWh) Demand for ID: 1539 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 8

Slide 9

Slide 9 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: 9

Slide 10

Slide 10 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros 9

Slide 11

Slide 11 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros non-negative support 9

Slide 12

Slide 12 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros non-negative support high skewness 9

Slide 13

Slide 13 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length 9

Slide 14

Slide 14 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. 9

Slide 15

Slide 15 text

Quantiles conditional on time of week Sample quantiles better than kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. Allows clustering of households based on probabilistic behaviour rather than coincident behaviour. 9

Slide 16

Slide 16 text

Pairwise distances The time series of 535 × 48 observations per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. 10 −→

Slide 17

Slide 17 text

Pairwise distances The time series of 535 × 48 observations per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. Can we compute pairwise distances between all households? 10 −→ ← ? → Distance

Slide 18

Slide 18 text

Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) = ∞ ∞ p(x) log p(x) q(x) dx 11

Slide 19

Slide 19 text

Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) = ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) 11

Slide 20

Slide 20 text

Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) = ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 11

Slide 21

Slide 21 text

Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) = ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 Distance between two households ∆ij = 7×48 t=1 JS(pt , qt) 11

Slide 22

Slide 22 text

Kernel matrix and density ranking Similarity between two households wij = exp(−∆2 ij /h2). 12

Slide 23

Slide 23 text

Kernel matrix and density ranking Similarity between two households wij = exp(−∆2 ij /h2). Row sums of the kernel matrix gives a scaled kernel density estimate of households: ˆ fi = n j=1 wij h is bandwidth in Gaussian kernel. Households can be ranked by density values. 12

Slide 24

Slide 24 text

Most typical household Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 2.5 5.0 7.5 Demand (kWh) Demand for ID: 1672 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 13

Slide 25

Slide 25 text

Most anomalous household Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.5 1.0 Demand (kWh) Demand for ID: 1881 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.3 0.6 0.9 1.2 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 14

Slide 26

Slide 26 text

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space where the distances are preserved “as far as possible”. 15

Slide 27

Slide 27 text

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). 15

Slide 28

Slide 28 text

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. 15

Slide 29

Slide 29 text

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. 15

Slide 30

Slide 30 text

Laplacian eigenmaps Idea: Embed conditional densities in a 2d space where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. Then e2 and e3 create an embedding of households in 2d space. 15

Slide 31

Slide 31 text

Key property of Laplacian embedding Let yi = (e2,i , e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. 16

Slide 32

Slide 32 text

Key property of Laplacian embedding Let yi = (e2,i , e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. 16

Slide 33

Slide 33 text

Key property of Laplacian embedding Let yi = (e2,i , e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. 16

Slide 34

Slide 34 text

Key property of Laplacian embedding Let yi = (e2,i , e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. Equivalent to optimal embedding using Laplace-Beltrami operator on manifolds. 16

Slide 35

Slide 35 text

Outliers shown in embedded space q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1881 −4 −2 0 2 −2 −1 0 1 2 Comp1 Comp2 HDRs q q q q 1 50 99 >99 Laplacian embedding (HDRs on original space) 17

Slide 36

Slide 36 text

Features and limitations Features of approach Converting time series to quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. 18

Slide 37

Slide 37 text

Features and limitations Features of approach Converting time series to quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18

Slide 38

Slide 38 text

Features and limitations Features of approach Converting time series to quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18 robjhyndman.com