Probabilistic outlier detection and visualization of smart metre data
Talk given at meeting of New Zealand Statistical Association and International Association for Statistical Computing (11-14 December 2017), Auckland, New Zealand.
p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 1 2 3 4 5 0 200 400 Days Demand (kWh) Demand for ID: 1550 5
p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 2 4 6 0 200 400 Days Demand (kWh) Demand for ID: 1539 7
kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. 9
kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. Allows clustering of households based on probabilistic behaviour rather than coincident behaviour. 9
per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. Can we compute pairwise distances between all households? 10 −→ ← ? → Distance
= exp(−∆2 ij /h2). Row sums of the kernel matrix gives a scaled kernel density estimate of households: ˆ fi = n j=1 wij h is bandwidth in Gaussian kernel. Households can be ranked by density values. 12
where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). 15
where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. 15
where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. 15
where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. Then e2 and e3 create an embedding of households in 2d space. 15
e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. 16
e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. 16
e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. Equivalent to optimal embedding using Laplace-Beltrami operator on manifolds. 16
quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. 18
quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18
quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18 robjhyndman.com