Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paris Time Series Meetup #9 - Comment gérer la ...

Paris Time Series Meetup #9 - Comment gérer la labellisation des séries-temporelles et la détection d’anomalies grâce à InfluxDB ?

Support de présentation de Jean Muller - CTO d'Ezako - sur la labellisation et la détection d'anomalies sur des séries temporelles avec du Machine learning et InfluxDB.

Présentation dans le cadre de l'édition 9 du Paris Time Series Meetup

TimeSeriesFr

January 19, 2021
Tweet

More Decks by TimeSeriesFr

Other Decks in Technology

Transcript

  1. Julien Muller AI expert Ex-IBM Big Data architect https://www.linkedin.com/in/mullerjulien/ CTO

    at Ezako Creator of Upalgo 2 Download our whitepaper : https://ezako.com/en/time-series-labeling/
  2. We are Ezako Based in Paris and in Sophia-Antipolis on

    the French Riviera. Startup specialized in AI and time-series data. Expertise in Machine Learning. Creator of Upalgo. Aerospace, Automotive, Telecom. Sensor, telemetric and IoT data. 3 Ezako offices in Sophia-Antipolis Download our whitepaper : https://ezako.com/en/time-series-labeling/
  3. Why Upalgo ? Upalgo is a time series management suite.

    4 Anomaly Detection Labeling Time series & Machine Learning: - Large datasets - Temporality matters - We don’t know the ground truth Download our whitepaper : https://ezako.com/en/time-series-labeling/
  4. InfluxDB and Ezako 5 Using InfluxDB since 2016 Influx is

    the 4th (relational database, nosql, hadoop ...) system we use for storage of TS data. Our issues were: - Big data (sampling) & high frequency - Slow access - Need for specific elements in the engine Windows & features - Need a community to get answers (as this is a very specific field) Why did we chose InfluxDB ? - Storage adapted to TS data - Better performance - Native nanosecond handling - No schema Download our whitepaper : https://ezako.com/en/time-series-labeling/
  5. Upalgo architecture 6 Our data challenges: - Continuous writes -

    Intensive reads at learning phases The architectural solution: - InfluxDB Download our whitepaper : https://ezako.com/en/time-series-labeling/
  6. Machine Learning with InfluxDB 7 Machine Learning is challenging because:

    - Continuous data insert (often between 1khz to 50khz sensors) - Intensive metadata / feature calculations - Learning on huge datasets - Fast detection on small data sets - You don’t know the ground truth InfluxDB brings a solution to these limitations. Download our whitepaper : https://ezako.com/en/time-series-labeling/
  7. An Anomaly Detection workflow 8 Anomaly Detection in time-series is

    hard because two users won’t have the same definition of an anomaly. A solid workflow is essential to perform a good Anomaly Detection: ➔ insert data ➔ calculate features ➔ understand your data ➔ learn a model ➔ detect Download our whitepaper : https://ezako.com/en/time-series-labeling/
  8. InfluxDB as intermediary storage 9 Raw data must be stored

    (reference). Adjusted data is useful. ➔ We store several calculated time-series for each raw time-serie. Download our whitepaper : https://ezako.com/en/time-series-labeling/
  9. An Anomaly Detection workflow 10 Raw Data Download our whitepaper

    : https://ezako.com/en/time-series-labeling/
  10. What is Labeling ? Labeling is the activity of tagging

    one or more labels to identify certain properties or characteristics of data. Labeled data produce considerable improvement in learning accuracy. Labeling is a time consuming process which is a crucial part of training machine learning algorithms. Data Scientists and experts spend most of their time in this repetitive task. 11 Download our whitepaper : https://ezako.com/en/time-series-labeling/
  11. Challenge 1 12 1. User friendly UI 2. Auto label

    spreading with Machine Learning How do you put 2 000 labels on 20 million data points in a few minutes? Download our whitepaper : https://ezako.com/en/time-series-labeling/
  12. Labeling is interesting because 13 ➔ Experts want more information

    on their data ➔ Supervised Machine Learning need labels ➔ Manual labeling is exhausting Download our whitepaper : https://ezako.com/en/time-series-labeling/
  13. Ergonomics can increase by 15 times the speed of labeling

    14 Download our whitepaper : https://ezako.com/en/time-series-labeling/
  14. AI based label conflict management All the labels are controlled

    for conflicts. Benefits: reduce labeling errors. 15 Download our whitepaper : https://ezako.com/en/time-series-labeling/
  15. Label propagation can increase by 15 times the labeling speed

    The idea is to to label the entire dataset with AI based auto label propagation. Benefits: much faster labelling. 16 Download our whitepaper : https://ezako.com/en/time-series-labeling/
  16. To sum-up 18 Time-series labeling and feedback management is very

    complex and difficult. The solution is to: - adopt a TS database as InfluxDB - create a user-friendly UI - use algorithms to speed up labellisation - implement an efficient workflow Our experience with InfluxDB: - pretty smooth - plug and forget mentality Download our whitepaper : https://ezako.com/en/time-series-labeling/