Upgrade to Pro — share decks privately, control downloads, hide ads and more …

State of the art time-series analysis with deep learning by Javier Ordóñez at Big Data Spain 2017

State of the art time-series analysis with deep learning by Javier Ordóñez at Big Data Spain 2017

Time series related problems have traditionally been solved using engineered features obtained by heuristic processes.

https://www.bigdataspain.org/2017/talk/state-of-the-art-time-series-analysis-with-deep-learning

Big Data Spain 2017
November 16th - 17th

Big Data Spain

December 04, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. What is this about? Approach for time series analysis using

    deep neural nets What are we going to see: Brief introduction Deep learning concepts Model Use case Core ref: “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition” FJ Ordóñez, et. al
  2. Time series classification Time series forecasting ECG anomaly detection Energy

    demand prediction Human activity recognition Stock market prediction Time series A time series is a sequence of regular time-ordered observations e.g. stock prices, weather readings, smartphone sensor data, health monitoring data “Traditional” approaches for time series analysis are based on autoregressive models -Challenges: Tackle feature design, usually a single signal involved, etc
  3. Why deep learning? •State of the art in speech recognition

    and computer vision •Capable of automatically learn features ◦Don’t require much domain knowledge •Works better when you have lots of labelled data © David Yanofsky |
  4. Model that learns by the example •using many examples •defined

    as series of hierarchically connected functions (layers) •can be very complex (deep!) Artificial neural nets
  5. Model that learns by the example •using many examples •defined

    as series of hierarchically connected functions (layers) •can be very complex (deep!) Input Hidden layer Output Artificial neural nets
  6. What does it know? •composed by units (neurons), distributed in

    layers, which control whether the data flow should continue (activation level) •controlled by “weights” and nonlinear functions Artificial neural nets Input Hidden layer Output
  7. How does it learn? •correcting the errors •backpropagation!, the weights

    are adjusted and readjusted, layer by layer, until the network can have the fewest possible errors Artificial neural nets Input Hidden layer Output
  8. Case: image processing •Classical problem: MNIST dataset ◦It’s the “Hello

    World” of image processing •Recognition of handwritten numbers •Training - 60,000 pictures to learn the relation picture-label
  9. •Convolutional nets are less dense = less number of weights

    •Focus on local patterns, assuming that neighboring variables are locally correlated - Images - Pixels that are close •One simple operation is repeated over and over several times starting with the raw input data. •They work very well. State of the art results in different fields Convnets
  10. Convnets: dimensions •Number of dimensions is not relevant •Time series

    use 1D filters (or kernels) •They are also feature extractors © Wikipedia
  11. Convnets: signals •Same principles: ◦Operations applied in a hierarchy ◦Each

    filter will define a feature map ◦As many features maps as filters ◦Each filter captures a pattern •Result is another sequence/signal ◦Transformed by the operations 3rd layer 2nd layer 1st
  12. Recurrent neural net nets •Designed to learn from time related

    data •Units with persistence •Input includes the output from previous timestep © deeplearning4j
  13. Memory cells which can maintain its state over time, and

    non-linear gating units which regulate the information flow into and out of the cell Long short-term memory “Generating Sequences With Recurrent Neural Networks”
  14. LSTM: Layers “Recurrent Neural Network Regularization” Zaremba, W. •Also in

    a hierarchy. Output of layer l is the input of layer l+1 •Can model more complex time relations
  15. DeepConvLSTM Deep framework based on convolutional and LSTM recurrent units

    •The convolutional layers are feature extractors and provide abstract representations of the input data in feature maps. •The recurrent layers model the temporal dynamics of the activation of the feature maps https://github.com/sussexwearlab/DeepConvLST M
  16. DeepConvLSTM •Architecture ◦How many layers ◦How many nodes/filters ◦Which type

    •Data ◦Batches size ◦Size of filters ◦Number of steps the memory cells will learn •Training: ◦Regularization ◦Learning rate ◦Gradient expressions ◦Init policy Parameters are learnt automatically, but the hyperparameters??
  17. •Architecture ◦Layers: Conv(64)−Conv(64)−Conv(64)−Conv(64)−LSTM(128)−LSTM(128) ◦Type: ReLUs units for conv layers •Data

    ◦Batches size: 100 (careful with the GPU memory) ◦Size of filters: 5 samples ◦Number of steps the memory cells will learn: 24 samples •Training ◦Regularization: Dropout in the conv layers ◦Learning rate: Small (0.0001) ◦Gradient expressions: RMSProp. Usually a good choice for RNN DeepConvLSTM: hyperparams
  18. DeepConvLSTM: architecture Input 1st conv 2nd conv 3rd conv 4th

    conv 1st lstm 2nd lstm Output 64 64 64 64 128 128
  19. Stand Run Walk time Activity recognition Output: Activity label Input:

    Sensor signals Supervised classification problem. Most likely activity label according to a sensor signal for each time instant.
  20. Dataset •113 sensor channels •30 Hz •6 hours data •17

    home activities: ◦Open/Close fridge ◦Clean Table ◦Toggle switch ◦Open/Close dishwasher ◦Open/Close drawers ◦Open/Close doors ◦Drink from cup "Collecting complex activity datasets in highly rich networked sensor environments" Roggen, D. et al OPPORTUNITY dataset
  21. F-score •Considers all errors equally important •Combines precision and recall

    •Value between 0 and 1 •The higher the F-score the better the model Metrics Loss •Measures of the number of errors •Value aimed to optimize during the learning process •Value between 0 and 1 •The lower the loss, the better a model 1 0 f-score 1 0
  22. •Benchmark: ◦OPPORTUNITY challenge •~1M parameters •Single GPU - 1664 cores

    •Training takes ~6h to converge •Classification takes ~6 seconds (kNN) (SVM) (kNN + SVM) Performance F-score 1 0
  23. Summary Automatic feature learning. A convolutional filter captures a specific

    salient pattern and would act as a feature detector Core ref: “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition” FJ Ordóñez, et. al We have to deal with the hyperparameters. “Learning to learn by gradient descent by gradient descent” Andrychowicz. M. Recurrent layers can learn the temporal dynamics of such features State of the art performance with restrained nets (~1M params). Capable of real time processing