Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AIOps leveraged by Deep Learning and Knowledge Graph data representations - Florian Krausbeck

AIOps leveraged by Deep Learning and Knowledge Graph data representations - Florian Krausbeck

AIOps is one of the most promising fields where machine learning and in particular deep learning is starting to play an increasingly dominant role. Besides streamlining different tasks, machine learning algorithms are able to give additional insights into complex business processes, which most often cannot be maintained anymore by a human being without automation. One of the reasons is the number of interdependent system components which interact with each other in order to fulfil a certain task.
In this talk, I will show how a specific data representation, namely Knowledge Graphs, can help in order to speed up and optimise different areas within the field of AIOps. Several hands-on examples will be shown and an outlook will be given what other impacts will be generated by AI research in the AIOps community.

DevOpsDays Zurich

May 15, 2019

More Decks by DevOpsDays Zurich

Other Decks in Technology


  1. • 95 percent of ATM swipes rely on COBOL code

    • 220’000’000’000 lines of COBOL in use today
  2. Testing Processes Will Change Completely In the Near Future Autonomous

    teams and continuous deployment in production as done today by internet giants requires ... • metric driven development and testing approaches • Strong analytics of events, errors, crashes, usage counters, API success rates and lots of other metrics • ML approaches … in order to handle increasing complexity while reducing costs
  3. • log relevancy is user specific • people tend to

    search for known issues • there are also unknown unknowns • labels are potentially very tedious to acquire How to get labels then? • implicit/explicit user behaviour (e.g. opening Kibana, flagging a log) • inter-user similarities • public knowledge bases Analysing Log Files - An Ill Posed Problem
  4. What are the Requirements of an “Artificial Operational Intelligence Team

    Member”? • it takes away “annoying” and time consuming tasks from people and prevents us from human errors in repetitive work • is part of and also extends existing workflows and infrastructure of QAT • can work on different environments for developers, QA-engineers and managers at the same time • is continuously learning
  5. What Exactly is Deep Learning ? Every neuron implements a

    relatively simple mathematical function However, the concatenation of millions of such functions is very powerful.
  6. Defining an Entire System State with Autoencoders Why Autoencoders? •

    Modern software infrastructure consists of thousands of parallel services working together • Monitoring all services becomes impossible, even more so for identifying strange behaviour • Autoencoders automatically generate a reduced representation of all signals that encompasses any variability • This representation can more easily be used to identify errors and strange behaviour Input Input/ Output state
  7. Using metrics from the entire software architecture, a generalised system

    state can be derived which detects failures as well as predicts future behaviour. [State Vector] Error Detection Infrastructure Metrics Autoencoder Classification and prediction models Automated Detection of Infrastructure Degradations
  8. Constantly Changing Software Versions Require Adaptive Machine Learning Models IMPROVE

    Supervisor Workflow automatically retrains with new data & evaluates models DETECTION Models detect anomalies in time series data FEEDBACK Domain experts evaluate and label predicted anomalies Supervisor Workflow
  9. Adaptive model training using AutoML • Using AutoML approaches for

    classifiers/anomaly detectors allows models to adapt far more to changes in data. • Allows for automation of model selection or hyperparameter tuning within supervisor workflow. • Saves weeks of time with reduced data science workflow
  10. Community Detection Detects group clustering or partition options Centrality/ Importance

    Determines the importance of distinct nodes in the network Pathfinding & Search Finds the optimal paths or evaluates route availability and quality Similarity Evaluates how alike nodes are Embeddings Learned representations of connectivity or topology
  11. • can sift through large amounts of data • can

    evolve and react to changes in data • figure out errors in QA results and present better testing patterns • detect issues with new releases based on available logs • plan the best configuration to maximize performance • provides a visual overview of the current situation Advantages of Machine Learning and Knowledge Graphs in DevOps
  12. • Uncertainty → stochastic vs. deterministic models • requires large

    amounts of data to be effective • “technical debt” - accumulation of technical faults Disadvantages of Machine Learning in DevOps