State of the art time-series analysis with deep learning by Javier Ordóñez at Big Data Spain 2017

State of the art time- series analysis with deep learning

Who am I? Francisco Javier Ordóñez Lead Data Scientist [email protected]
o http://stylesage. co

What is this about? Approach for time series analysis using
deep neural nets What are we going to see: Brief introduction Deep learning concepts Model Use case Core ref: “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition” FJ Ordóñez, et. al

Time series classification Time series forecasting ECG anomaly detection Energy
demand prediction Human activity recognition Stock market prediction Time series A time series is a sequence of regular time-ordered observations e.g. stock prices, weather readings, smartphone sensor data, health monitoring data “Traditional” approaches for time series analysis are based on autoregressive models -Challenges: Tackle feature design, usually a single signal involved, etc

Why deep learning? •State of the art in speech recognition
and computer vision •Capable of automatically learn features ◦Don’t require much domain knowledge •Works better when you have lots of labelled data © David Yanofsky |

CONCEPTS

Model that learns by the example •using many examples •defined
as series of hierarchically connected functions (layers) •can be very complex (deep!) Artificial neural nets

Model that learns by the example •using many examples •defined
as series of hierarchically connected functions (layers) •can be very complex (deep!) Input Hidden layer Output Artificial neural nets

What does it know? •composed by units (neurons), distributed in
layers, which control whether the data flow should continue (activation level) •controlled by “weights” and nonlinear functions Artificial neural nets Input Hidden layer Output

How does it learn? •correcting the errors •backpropagation!, the weights
are adjusted and readjusted, layer by layer, until the network can have the fewest possible errors Artificial neural nets Input Hidden layer Output

Case: image processing •Classical problem: MNIST dataset ◦It’s the “Hello
World” of image processing •Recognition of handwritten numbers •Training - 60,000 pictures to learn the relation picture-label

Input Hidden layer Output Case: image processing It’s a 8!!

•Convolutional nets are less dense = less number of weights
•Focus on local patterns, assuming that neighboring variables are locally correlated - Images - Pixels that are close •One simple operation is repeated over and over several times starting with the raw input data. •They work very well. State of the art results in different fields Convnets

Convnets: filters Input Output Filte r •Parameters of convnets •Capture
a feature

Convnets: features http://caisplusplus.usc.edu/

Convnets: architecture https://blogs.sap.com/2015/01/14/image-classification-with-convolutional-neural-networks-my-attempt-at-the-ndsb-kaggle- competition/

Convnets: training http://www.cs.nyu.edu/~yann/research/deep/ Y . LeCun •Random initialization •Backpropagated end
to end

“Convolutional deep belief networks for scalable unsupervised learning of hierarchical
representations”. H Lee Convnets: feature learning

Convnets and images: summary http://karpathy.github.io/2015/10/25/selfie/ A. Karpathy •Result is a
transformed representation of the input

Convnets: dimensions •Number of dimensions is not relevant •Time series
use 1D filters (or kernels) •They are also feature extractors © Wikipedia

Convnets: signals •Same principles: ◦Operations applied in a hierarchy ◦Each
filter will define a feature map ◦As many features maps as filters ◦Each filter captures a pattern •Result is another sequence/signal ◦Transformed by the operations 3rd layer 2nd layer 1st

Recurrent neural net nets •Designed to learn from time related
data •Units with persistence •Input includes the output from previous timestep © deeplearning4j

Memory cells which can maintain its state over time, and
non-linear gating units which regulate the information flow into and out of the cell Long short-term memory “Generating Sequences With Recurrent Neural Networks”

LSTM: Layers “Recurrent Neural Network Regularization” Zaremba, W. •Also in
a hierarchy. Output of layer l is the input of layer l+1 •Can model more complex time relations

DeepConvLSTM Deep framework based on convolutional and LSTM recurrent units
•The convolutional layers are feature extractors and provide abstract representations of the input data in feature maps. •The recurrent layers model the temporal dynamics of the activation of the feature maps https://github.com/sussexwearlab/DeepConvLST M

DeepConvLSTM •Architecture ◦How many layers ◦How many nodes/filters ◦Which type
•Data ◦Batches size ◦Size of filters ◦Number of steps the memory cells will learn •Training: ◦Regularization ◦Learning rate ◦Gradient expressions ◦Init policy Parameters are learnt automatically, but the hyperparameters??

•Architecture ◦Layers: Conv(64)−Conv(64)−Conv(64)−Conv(64)−LSTM(128)−LSTM(128) ◦Type: ReLUs units for conv layers •Data
◦Batches size: 100 (careful with the GPU memory) ◦Size of filters: 5 samples ◦Number of steps the memory cells will learn: 24 samples •Training ◦Regularization: Dropout in the conv layers ◦Learning rate: Small (0.0001) ◦Gradient expressions: RMSProp. Usually a good choice for RNN DeepConvLSTM: hyperparams

DeepConvLSTM: architecture Input 1st conv 2nd conv 3rd conv 4th
conv 1st lstm 2nd lstm Output 64 64 64 64 128 128

Code +Lasagn e

USE CASE

Stand Run Walk time Activity recognition Output: Activity label Input:
Sensor signals Supervised classification problem. Most likely activity label according to a sensor signal for each time instant.

Dataset •113 sensor channels •30 Hz •6 hours data •17
home activities: ◦Open/Close fridge ◦Clean Table ◦Toggle switch ◦Open/Close dishwasher ◦Open/Close drawers ◦Open/Close doors ◦Drink from cup "Collecting complex activity datasets in highly rich networked sensor environments" Roggen, D. et al OPPORTUNITY dataset

Labels

F-score •Considers all errors equally important •Combines precision and recall
•Value between 0 and 1 •The higher the F-score the better the model Metrics Loss •Measures of the number of errors •Value aimed to optimize during the learning process •Value between 0 and 1 •The lower the loss, the better a model 1 0 f-score 1 0

•Benchmark: ◦OPPORTUNITY challenge •~1M parameters •Single GPU - 1664 cores
•Training takes ~6h to converge •Classification takes ~6 seconds (kNN) (SVM) (kNN + SVM) Performance F-score 1 0

Training visualization Greenness Thickness Influence in Activation final score level
F-score Loss 1 0 1 0 class 1 class 2 class n

Summary Automatic feature learning. A convolutional filter captures a specific
salient pattern and would act as a feature detector Core ref: “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition” FJ Ordóñez, et. al We have to deal with the hyperparameters. “Learning to learn by gradient descent by gradient descent” Andrychowicz. M. Recurrent layers can learn the temporal dynamics of such features State of the art performance with restrained nets (~1M params). Capable of real time processing

State of the art time-series analysis with deep...

State of the art time-series analysis with deep learning by Javier Ordóñez at Big Data Spain 2017

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript