Slide 1

Slide 1 text

How to Improve Data Labels and Feedback Loops in time-series using InfluxDB

Slide 2

Slide 2 text

Julien Muller AI expert Ex-IBM Big Data architect https://www.linkedin.com/in/mullerjulien/ CTO at Ezako Creator of Upalgo 2 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 3

Slide 3 text

We are Ezako Based in Paris and in Sophia-Antipolis on the French Riviera. Startup specialized in AI and time-series data. Expertise in Machine Learning. Creator of Upalgo. Aerospace, Automotive, Telecom. Sensor, telemetric and IoT data. 3 Ezako offices in Sophia-Antipolis Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 4

Slide 4 text

Why Upalgo ? Upalgo is a time series management suite. 4 Anomaly Detection Labeling Time series & Machine Learning: - Large datasets - Temporality matters - We don’t know the ground truth Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 5

Slide 5 text

InfluxDB and Ezako 5 Using InfluxDB since 2016 Influx is the 4th (relational database, nosql, hadoop ...) system we use for storage of TS data. Our issues were: - Big data (sampling) & high frequency - Slow access - Need for specific elements in the engine Windows & features - Need a community to get answers (as this is a very specific field) Why did we chose InfluxDB ? - Storage adapted to TS data - Better performance - Native nanosecond handling - No schema Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 6

Slide 6 text

Upalgo architecture 6 Our data challenges: - Continuous writes - Intensive reads at learning phases The architectural solution: - InfluxDB Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 7

Slide 7 text

Machine Learning with InfluxDB 7 Machine Learning is challenging because: - Continuous data insert (often between 1khz to 50khz sensors) - Intensive metadata / feature calculations - Learning on huge datasets - Fast detection on small data sets - You don’t know the ground truth InfluxDB brings a solution to these limitations. Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 8

Slide 8 text

An Anomaly Detection workflow 8 Anomaly Detection in time-series is hard because two users won’t have the same definition of an anomaly. A solid workflow is essential to perform a good Anomaly Detection: ➔ insert data ➔ calculate features ➔ understand your data ➔ learn a model ➔ detect Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 9

Slide 9 text

InfluxDB as intermediary storage 9 Raw data must be stored (reference). Adjusted data is useful. ➔ We store several calculated time-series for each raw time-serie. Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 10

Slide 10 text

An Anomaly Detection workflow 10 Raw Data Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 11

Slide 11 text

What is Labeling ? Labeling is the activity of tagging one or more labels to identify certain properties or characteristics of data. Labeled data produce considerable improvement in learning accuracy. Labeling is a time consuming process which is a crucial part of training machine learning algorithms. Data Scientists and experts spend most of their time in this repetitive task. 11 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 12

Slide 12 text

Challenge 1 12 1. User friendly UI 2. Auto label spreading with Machine Learning How do you put 2 000 labels on 20 million data points in a few minutes? Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 13

Slide 13 text

Labeling is interesting because 13 ➔ Experts want more information on their data ➔ Supervised Machine Learning need labels ➔ Manual labeling is exhausting Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 14

Slide 14 text

Ergonomics can increase by 15 times the speed of labeling 14 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 15

Slide 15 text

AI based label conflict management All the labels are controlled for conflicts. Benefits: reduce labeling errors. 15 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 16

Slide 16 text

Label propagation can increase by 15 times the labeling speed The idea is to to label the entire dataset with AI based auto label propagation. Benefits: much faster labelling. 16 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 17

Slide 17 text

Label propagation 17 Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 18

Slide 18 text

To sum-up 18 Time-series labeling and feedback management is very complex and difficult. The solution is to: - adopt a TS database as InfluxDB - create a user-friendly UI - use algorithms to speed up labellisation - implement an efficient workflow Our experience with InfluxDB: - pretty smooth - plug and forget mentality Download our whitepaper : https://ezako.com/en/time-series-labeling/

Slide 19

Slide 19 text

Julien Muller [email protected] +33 6 65 06 64 66 www.ezako.com