Paris Time Series Meetup #9 - Comment gérer la labellisation des séries-temporelles et la détection d’anomalies grâce à InfluxDB ?
Support de présentation de Jean Muller - CTO d'Ezako - sur la labellisation et la détection d'anomalies sur des séries temporelles avec du Machine learning et InfluxDB.
Présentation dans le cadre de l'édition 9 du Paris Time Series Meetup
the French Riviera. Startup specialized in AI and time-series data. Expertise in Machine Learning. Creator of Upalgo. Aerospace, Automotive, Telecom. Sensor, telemetric and IoT data. 3 Ezako offices in Sophia-Antipolis Download our whitepaper : https://ezako.com/en/time-series-labeling/
4 Anomaly Detection Labeling Time series & Machine Learning: - Large datasets - Temporality matters - We don’t know the ground truth Download our whitepaper : https://ezako.com/en/time-series-labeling/
the 4th (relational database, nosql, hadoop ...) system we use for storage of TS data. Our issues were: - Big data (sampling) & high frequency - Slow access - Need for specific elements in the engine Windows & features - Need a community to get answers (as this is a very specific field) Why did we chose InfluxDB ? - Storage adapted to TS data - Better performance - Native nanosecond handling - No schema Download our whitepaper : https://ezako.com/en/time-series-labeling/
- Continuous data insert (often between 1khz to 50khz sensors) - Intensive metadata / feature calculations - Learning on huge datasets - Fast detection on small data sets - You don’t know the ground truth InfluxDB brings a solution to these limitations. Download our whitepaper : https://ezako.com/en/time-series-labeling/
hard because two users won’t have the same definition of an anomaly. A solid workflow is essential to perform a good Anomaly Detection: ➔ insert data ➔ calculate features ➔ understand your data ➔ learn a model ➔ detect Download our whitepaper : https://ezako.com/en/time-series-labeling/
(reference). Adjusted data is useful. ➔ We store several calculated time-series for each raw time-serie. Download our whitepaper : https://ezako.com/en/time-series-labeling/
one or more labels to identify certain properties or characteristics of data. Labeled data produce considerable improvement in learning accuracy. Labeling is a time consuming process which is a crucial part of training machine learning algorithms. Data Scientists and experts spend most of their time in this repetitive task. 11 Download our whitepaper : https://ezako.com/en/time-series-labeling/
spreading with Machine Learning How do you put 2 000 labels on 20 million data points in a few minutes? Download our whitepaper : https://ezako.com/en/time-series-labeling/
on their data ➔ Supervised Machine Learning need labels ➔ Manual labeling is exhausting Download our whitepaper : https://ezako.com/en/time-series-labeling/
The idea is to to label the entire dataset with AI based auto label propagation. Benefits: much faster labelling. 16 Download our whitepaper : https://ezako.com/en/time-series-labeling/
complex and difficult. The solution is to: - adopt a TS database as InfluxDB - create a user-friendly UI - use algorithms to speed up labellisation - implement an efficient workflow Our experience with InfluxDB: - pretty smooth - plug and forget mentality Download our whitepaper : https://ezako.com/en/time-series-labeling/