Paris Time Series Meetup #9 - Comment gérer la labellisation des séries-temporelles et la détection d’anomalies grâce à InfluxDB ?

How to Improve Data Labels and Feedback Loops in time-series
using InfluxDB

Julien Muller AI expert Ex-IBM Big Data architect https://www.linkedin.com/in/mullerjulien/ CTO
at Ezako Creator of Upalgo 2 Download our whitepaper : https://ezako.com/en/time-series-labeling/

We are Ezako Based in Paris and in Sophia-Antipolis on
the French Riviera. Startup specialized in AI and time-series data. Expertise in Machine Learning. Creator of Upalgo. Aerospace, Automotive, Telecom. Sensor, telemetric and IoT data. 3 Ezako offices in Sophia-Antipolis Download our whitepaper : https://ezako.com/en/time-series-labeling/

Why Upalgo ? Upalgo is a time series management suite.
4 Anomaly Detection Labeling Time series & Machine Learning: - Large datasets - Temporality matters - We don’t know the ground truth Download our whitepaper : https://ezako.com/en/time-series-labeling/

InfluxDB and Ezako 5 Using InfluxDB since 2016 Influx is
the 4th (relational database, nosql, hadoop ...) system we use for storage of TS data. Our issues were: - Big data (sampling) & high frequency - Slow access - Need for specific elements in the engine Windows & features - Need a community to get answers (as this is a very specific field) Why did we chose InfluxDB ? - Storage adapted to TS data - Better performance - Native nanosecond handling - No schema Download our whitepaper : https://ezako.com/en/time-series-labeling/

Upalgo architecture 6 Our data challenges: - Continuous writes -
Intensive reads at learning phases The architectural solution: - InfluxDB Download our whitepaper : https://ezako.com/en/time-series-labeling/

Machine Learning with InfluxDB 7 Machine Learning is challenging because:
- Continuous data insert (often between 1khz to 50khz sensors) - Intensive metadata / feature calculations - Learning on huge datasets - Fast detection on small data sets - You don’t know the ground truth InfluxDB brings a solution to these limitations. Download our whitepaper : https://ezako.com/en/time-series-labeling/

An Anomaly Detection workflow 8 Anomaly Detection in time-series is
hard because two users won’t have the same definition of an anomaly. A solid workflow is essential to perform a good Anomaly Detection: ➔ insert data ➔ calculate features ➔ understand your data ➔ learn a model ➔ detect Download our whitepaper : https://ezako.com/en/time-series-labeling/

InfluxDB as intermediary storage 9 Raw data must be stored
(reference). Adjusted data is useful. ➔ We store several calculated time-series for each raw time-serie. Download our whitepaper : https://ezako.com/en/time-series-labeling/

An Anomaly Detection workflow 10 Raw Data Download our whitepaper
: https://ezako.com/en/time-series-labeling/

What is Labeling ? Labeling is the activity of tagging
one or more labels to identify certain properties or characteristics of data. Labeled data produce considerable improvement in learning accuracy. Labeling is a time consuming process which is a crucial part of training machine learning algorithms. Data Scientists and experts spend most of their time in this repetitive task. 11 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Challenge 1 12 1. User friendly UI 2. Auto label
spreading with Machine Learning How do you put 2 000 labels on 20 million data points in a few minutes? Download our whitepaper : https://ezako.com/en/time-series-labeling/

Labeling is interesting because 13 ➔ Experts want more information
on their data ➔ Supervised Machine Learning need labels ➔ Manual labeling is exhausting Download our whitepaper : https://ezako.com/en/time-series-labeling/

Ergonomics can increase by 15 times the speed of labeling
14 Download our whitepaper : https://ezako.com/en/time-series-labeling/

AI based label conflict management All the labels are controlled
for conflicts. Benefits: reduce labeling errors. 15 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Label propagation can increase by 15 times the labeling speed
The idea is to to label the entire dataset with AI based auto label propagation. Benefits: much faster labelling. 16 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Label propagation 17 Download our whitepaper : https://ezako.com/en/time-series-labeling/

To sum-up 18 Time-series labeling and feedback management is very
complex and difficult. The solution is to: - adopt a TS database as InfluxDB - create a user-friendly UI - use algorithms to speed up labellisation - implement an efficient workflow Our experience with InfluxDB: - pretty smooth - plug and forget mentality Download our whitepaper : https://ezako.com/en/time-series-labeling/

Julien Muller [email protected] +33 6 65 06 64 66 www.ezako.com

Paris Time Series Meetup #9 - Comment gérer la ...

Paris Time Series Meetup #9 - Comment gérer la labellisation des séries-temporelles et la détection d’anomalies grâce à InfluxDB ?

TimeSeriesFr

More Decks by TimeSeriesFr

Other Decks in Technology

Featured

Transcript

How to Improve Data Labels and Feedback Loops in time-series

Julien Muller AI expert Ex-IBM Big Data architect https://www.linkedin.com/in/mullerjulien/ CTO

We are Ezako Based in Paris and in Sophia-Antipolis on

Why Upalgo ? Upalgo is a time series management suite.

InfluxDB and Ezako 5 Using InfluxDB since 2016 Influx is

Upalgo architecture 6 Our data challenges: - Continuous writes -

Machine Learning with InfluxDB 7 Machine Learning is challenging because:

An Anomaly Detection workflow 8 Anomaly Detection in time-series is

InfluxDB as intermediary storage 9 Raw data must be stored

An Anomaly Detection workflow 10 Raw Data Download our whitepaper

What is Labeling ? Labeling is the activity of tagging

Challenge 1 12 1. User friendly UI 2. Auto label

Labeling is interesting because 13 ➔ Experts want more information

Ergonomics can increase by 15 times the speed of labeling

AI based label conflict management All the labels are controlled

Label propagation can increase by 15 times the labeling speed

Label propagation 17 Download our whitepaper : https://ezako.com/en/time-series-labeling/

To sum-up 18 Time-series labeling and feedback management is very

Julien Muller [email protected] +33 6 65 06 64 66 www.ezako.com