Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Probabilistic outlier detection and visualizati...

Rob J Hyndman
December 11, 2017

Probabilistic outlier detection and visualization of smart metre data

Talk given at meeting of New Zealand Statistical Association and International Association for Statistical Computing (11-14 December 2017), Auckland, New Zealand.

Rob J Hyndman

December 11, 2017
Tweet

More Decks by Rob J Hyndman

Other Decks in Research

Transcript

  1. Irish smart metre data Figure: http://solutions.3m.com 500 households from smart

    metering trial Electricity consumption at 30-minute intervals between 14 July 2009 and 31 December 2010 Heating/cooling energy usage excluded 2
  2. Irish smart metre data 0 1 2 3 4 5

    0 200 400 Days Demand (kWh) Demand for ID: 1550 3
  3. Irish smart metre data 0 2 4 6 0 200

    400 Days Demand (kWh) Demand for ID: 1539 4
  4. Quantiles conditional on time of week Compute sample quantiles at

    p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 1 2 3 4 5 0 200 400 Days Demand (kWh) Demand for ID: 1550 5
  5. Quantiles conditional on time of week Compute sample quantiles at

    p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 4 5 Demand (kWh) Demand for ID: 1550 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 1 2 3 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 6
  6. Quantiles conditional on time of week Compute sample quantiles at

    p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. 0 2 4 6 0 200 400 Days Demand (kWh) Demand for ID: 1539 7
  7. Quantiles conditional on time of week Compute sample quantiles at

    p = 0.01, 0.02, . . . , 0.99 for each household and each half-hour of the week. 336 probability distributions per household. Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Demand (kWh) Demand for ID: 1539 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 8
  8. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros 9
  9. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros non-negative support 9
  10. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros non-negative support high skewness 9
  11. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length 9
  12. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. 9
  13. Quantiles conditional on time of week Sample quantiles better than

    kernel density estimate: presence of zeros non-negative support high skewness Avoids missing data issues and variation in series length Avoids timing of household events, holidays, etc. Allows clustering of households based on probabilistic behaviour rather than coincident behaviour. 9
  14. Pairwise distances The time series of 535 × 48 observations

    per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. 10 −→
  15. Pairwise distances The time series of 535 × 48 observations

    per household is mapped to a set of 7 × 48 × 99 quantiles giving a bivariate surface for each household. Can we compute pairwise distances between all households? 10 −→ ← ? → Distance
  16. Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) =

    ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) 11
  17. Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) =

    ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 11
  18. Jensen-Shannon distances Kullback-Leibler divergence between two densities D(p, q) =

    ∞ ∞ p(x) log p(x) q(x) dx Not symmetric: D(p, q) = D(q, p) Jensen-Shannon distance between two densities JS(p, q) = [D(p, r) + D(q, r)]/2 where r = (p + q)/2 Distance between two households ∆ij = 7×48 t=1 JS(pt , qt) 11
  19. Kernel matrix and density ranking Similarity between two households wij

    = exp(−∆2 ij /h2). Row sums of the kernel matrix gives a scaled kernel density estimate of households: ˆ fi = n j=1 wij h is bandwidth in Gaussian kernel. Households can be ranked by density values. 12
  20. Most typical household Monday Tuesday Wednesday Thursday Friday Saturday Sunday

    0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 2.5 5.0 7.5 Demand (kWh) Demand for ID: 1672 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 2 4 6 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 13
  21. Most anomalous household Monday Tuesday Wednesday Thursday Friday Saturday Sunday

    0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.5 1.0 Demand (kWh) Demand for ID: 1881 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0.0 0.3 0.6 0.9 1.2 Time of day Quantiles 0.1 0.3 0.5 0.7 0.9 Probability 14
  22. Laplacian eigenmaps Idea: Embed conditional densities in a 2d space

    where the distances are preserved “as far as possible”. 15
  23. Laplacian eigenmaps Idea: Embed conditional densities in a 2d space

    where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). 15
  24. Laplacian eigenmaps Idea: Embed conditional densities in a 2d space

    where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. 15
  25. Laplacian eigenmaps Idea: Embed conditional densities in a 2d space

    where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. 15
  26. Laplacian eigenmaps Idea: Embed conditional densities in a 2d space

    where the distances are preserved “as far as possible”. Let W = [wij] where wij = exp(−∆2 ij /h2). D = diag(ˆ fi) where ˆ fi = n j=1 wij L = D − W (the Laplacian matrix). Solve generalized eigenvector problem: Le = λDe. Let ek be eigenvector corresponding to kth smallest eigenvalue. Then e2 and e3 create an embedding of households in 2d space. 15
  27. Key property of Laplacian embedding Let yi = (e2,i ,

    e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. 16
  28. Key property of Laplacian embedding Let yi = (e2,i ,

    e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. 16
  29. Key property of Laplacian embedding Let yi = (e2,i ,

    e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. 16
  30. Key property of Laplacian embedding Let yi = (e2,i ,

    e3,i) be the embedded point corresponding to household i. Then the Laplacian eigenmap minimizes ij wij(yi − yj)2 = y Ly such that y Dy = 1. the most similar points are as close as possible. First eigenvalue is 0 due to translation invariance. Equivalent to optimal embedding using Laplace-Beltrami operator on manifolds. 16
  31. Outliers shown in embedded space q q q q q

    q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1881 −4 −2 0 2 −2 −1 0 1 2 Comp1 Comp2 HDRs q q q q 1 50 99 >99 Laplacian embedding (HDRs on original space) 17
  32. Features and limitations Features of approach Converting time series to

    quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. 18
  33. Features and limitations Features of approach Converting time series to

    quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18
  34. Features and limitations Features of approach Converting time series to

    quantile surfaces conditional on time of week. Using pairwise distances between households. Using kernel matrices for density ranking and embedding. Unresolved issues Need to select the bandwidth h in constructing the similarity matrix. Two different uses of bandwidth: density-ranking, embedding. Different bandwidth in each case? The use of pairwise distances makes it hard to scale this algorithm. 18 robjhyndman.com