Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Deep Learning use case for water end use detection by Roberto Díaz and José Antonio Sánchez at Big Data Spain 2017

A Deep Learning use case for water end use detection by Roberto Díaz and José Antonio Sánchez at Big Data Spain 2017

Deep Learning (DL) is a major breakthrough in artificial intelligence with a high potential for predictive applications.

https://www.bigdataspain.org/2017/talk/a-deep-learning-use-case-for-water-end-use-detection

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Big Data Spain

December 01, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. Motivation Urban water supply • We need good demand management

    policies to achieve a good sustainable development. • Adding a new water source imply: • Higher costs. • Environmental damage. • Poorer quality. • “the largest, least expensive, and most environmentally sound source of water […] is the water currently being wasted in every sector of our economy”.[1] [1] Gleick, P. et al. (2003). Waste Not, Want Not: The Potential for Urban Water Conservation
  2. Motivation End uses of water • Residential use of water

    -> The 70% of total water consumption. • A good understanding of the demand and its characterization could be very useful to create good management policies. • Several problems can be addressed using AI techniques: • Final use classification (dishwasher, toilet, irrigation, taps). • Water demand forecasting.
  3. Motivation The problem • Installing a meter on each water

    device is very expensive and intrusive. • To overcome this problem, it is possible to install a unique precision meter at the home main water connection. • Predictive models can read these meters and make predictions: • End use: Classification problem. • Forecasting: Regression problem.
  4. Motivation Data Source • Canal de Isabel II monitors since

    2008 a sample of 300 homes spread over the region of Madrid. • 15 million hours monitored for 9 years. • 35 million of events. • The sample is stratified and spread along different geographical areas of the region to be considered representative of the domestic users of Madrid. • The goal is the study of patterns of consumption and end uses of urban water.
  5. Motivation Project information 7 PROJECT TITLE Pattern Recognition in Residential

    End Uses of Water RESEARCH LINE Assurance of the balance (availability / demand) CLIENT Canal de Isabel II CONSORTIU M Exeleria: Preprocessing tasks Treelogic: Machine Learning tasks GOAL Developing an automatic system for identifying the end uses of water in the domestic applications, from the signals registered by water meters, using advanced techniques of machine learning, such as artificial neural networks (ANN) or other statistical methods
  6. Starting Point • Data was labeled by operators (experts) who

    classify water use events using specialized software. • This task involves a considerable amount of man-hours. • 1 hour of an operator to analyse a two- week period of data from each installation.
  7. Starting Point 8 type of events SHOWERS (INCLUDING BATHTUBS) DISHWASHER

    WASHING MACHINE CISTERNS LEAKS FAUCETS POOL IRRIGATION
  8. Previous analysis and visualization Pulse to Flow 1 DATE COUNTER

    (Number of accumulated pulses) 01/06/2008 0:47:35 31542 01/06/2008 0:48:13 31543 01/06/2008 0:48:55 31544 01/06/2008 0:49:38 31545 01/06/2008 1:20:29 31546 01/06/2008 1:20:46 31547 01/06/2008 1:21:03 31548 01/06/2008 1:21:20 31549 ………………… ………………… BASELINE INFORMATION • Date • Number of pulse
  9. Previous analysis and visualization Episodes • An episode is a

    period of time where the flow is distinct to zero and is between two zero-flow instants. • An episode may consist of one or more events. • An event only belongs to an episode.
  10. Previous analysis and visualization Events • An event is an

    elementary unit of consumption that occurs in a period of time of enough duration, in which the instant flow can be clearly differentiated from the rest. • A particular domestic use may consist of one or more events. • One or several events that converge in time form an episode.
  11. 3 domestic uses which involve 4 events FAUCETS 1 EVENT

    CISTERNS 1 EVENT WASHING MACHINE CYCLE 1 WASHING MACHINE CYCLE 2 2 EVENT Q T
  12. 3 domestic uses which involve 4 events and 3 episodes

    Q T EPISODE 1 EPISODE 2 EPISODE 3 FAUCETS 1 EVENT CISTERNS 1 EVENT WASHING MACHINE CYCLE 1 WASHING MACHINE CYCLE 2
  13. Previous analysis and visualization Events identification • When an episode

    consist of more than one event, the events are overlapped. • Graphically the events are "stacked" on others as a ladder. • How do we discriminate events? o It is the same event if… ⁻ The flow rate keeps constant or the change is not significant. o It is a different event if… ⁻ There is a significant change in the flow rate.
  14. Approach Feature Extraction 2 37 FEATURES WERE EXTRACTED FROM EVERY

    EVENT: duration, volume, maximum flow, initial Gradient, …
  15. Approach Deep Neural Networks • Deep Learning (DL) is a

    major breakthrough in artificial intelligence with a high potential for predictive applications. • It has been recognized as one of ten breakthrough technologies according to MIT Technology Review. • DL has gone from being considered an academic field to being applied in engineering thanks to frameworks like TensorFlow or CNTK. • Very powerful, they can solve very complex tasks. • They require a large amount of data. • Large training times, they require specialized hardware for complex tasks. • Slow classifiers.
  16. Approach Speedup (SDAs) • A disadvantage of the backpropagation algorithm

    is that the training fast in the last layers (near the output), but very slow if we are far away from the output. • If we don’t have a lot of training data to perform a high number of back propagation iterations, we only train the layers at the output.. • If we can initialize the neural network with useful weights in the firsts layers, the training procedure will speed up. • If that initialization is not supervised we can use unlabeled data.
  17. Approach Speedup (SDAs) • Imagine a neural network that has

    one hidden layer • With the same number of neurons in the input than in the output. • We add noise to the input and we train the network to recover the original input. • The network will learn to generalize because it will receive different data with the same output. • The network will learn to identify useful features of the image.
  18. Approach Speedup (SDAs) • How can I initialize an MLP

    using autoencoders? • Stacking them. • We can remove the decoding layer and attach another autoencder in the output. • An autoencoder can just find basic useful weights. • The idea of autoencder in Deep Learning is using several autoencers training in a sequential way using the hidden layer as an input of the next autoencoder.
  19. Results Benchmark ACCURACY OF DEEP NEURAL NETWORKS 81.78% In 1l

    meters 91.19% In 0,1l meters ACCURACY OF SVMs 67.41% In 1l meters 84.78% In 0,1l meters
  20. What else…? Time Series • Water supply companies are also

    interested on: • Water demand forecasting. • Weather or quantitative precipitation forecast: o Volume of water in reservoirs. o Alert systems. • Time series forecasting.
  21. What else…? RNN 3 • Traditional NN assume that inputs

    are independents of each other. • RNN incorporate memory that contains the essence of what has happened previously.
  22. What else…? LSTM 3 • A variant of RNN, capable

    of learning long term dependences. • Internal architecture more complex than Simple RNN architecture. • Most widely used type of RNN.
  23. Solution • LSTM network o Input – 20 timesteps, 1

    feature o Hidden Layer 1 – 20 LSTM o Output – 1 neuron • MSE -> 16,94 4
  24. CONCLUSIO NS 01 02 03 04 Data science can help

    us to UNDERSTAND of the water demand and its characterization. Deep Learning Models can achieve very good results in terms of ACCURACY when is trained using large enough datasets. This METHODOLOGY is actually in use for processing data from the Panel for residential consumption patterns assessment and end- uses monitoring project of Canal de Isabel II in Madrid. It could be very USEFUL to create good management policies.
  25. THANKS ! Roberto Díaz LEADER OF THE DATA SCIENCE RESEARCH

    José Antonio Sánchez SENIOR R&D ENGINEER THANKS !
  26. Contacto Parque Tecnológico de Asturias Parcela 30 E33428 Llanera Asturias

    ESPAÑA Avda. Manoteras, 38 Oficina D614 E28050 Madrid ESPAÑA T +34 902 286 386 [email protected] www.Treelogic.com