Slide 1

Slide 1 text

Machine Learning to Predict Chaos

Slide 2

Slide 2 text

YURY NIÑO Site Reliability Engineer Chaos Engineering Advocate @yurynino https://www.yurynino.com/

Slide 3

Slide 3 text

AGENDA ● Cloud Native Transformation ● Current status ● Next: ML to predict Chaos ● Machine Learning Foundations ● ML to predict Chaos ● Failures Taxonomy ● Predicting failures www.yurynino.com

Slide 4

Slide 4 text

CLOUD NATIVE Transformation www.yurynino.com

Slide 5

Slide 5 text

Cloud Native is the future of computing. It’s going to allow you to deliver software more quickly and more cheaply. It looks like It’ll save the planet! www.yurynino.com

Slide 6

Slide 6 text

Cloud native is more than a tool set. It is a complete architecture, a philosophical approach for building applications taking advantage of cloud computing. Cloud native is an architecture for assembling all of the above cloud-based components in a way that is optimized for the cloud environment. It’s not about the servers, but the services. www.yurynino.com

Slide 7

Slide 7 text

www.yurynino.com

Slide 8

Slide 8 text

It is the future! Humans are defining the starting point for machines to learn and constantly improve. Self healing is the optimal way for systems to be operated and maintained. It is faster, more secure, and more reliable. www.yurynino.com

Slide 9

Slide 9 text

MACHINE LEARNING FOUNDATIONS www.yurynino.com

Slide 10

Slide 10 text

… what we want is a machine that can learn from experience ... Alan Turing, 1947 www.yurynino.com

Slide 11

Slide 11 text

Machine Learning Foundations www.yurynino.com

Slide 12

Slide 12 text

Machine Learning Foundations Machine learning is a research field at the intersection of statistics, artificial intelligence, and computer science for extracting knowledge from data. www.yurynino.com

Slide 13

Slide 13 text

www.yurynino.com

Slide 14

Slide 14 text

www.yurynino.com https://www.wordstream.com/blog/ws/2017/07/28/ma chine-learning-applications

Slide 15

Slide 15 text

Machine Learning has a lot applications in systems that we use daily! www.yurynino.com

Slide 16

Slide 16 text

Chaos Theory Half a century ago, the pioneers of chaos theory discovered that the “butterfly effect” makes long-term prediction impossible. www.yurynino.com

Slide 17

Slide 17 text

NOW! Machine Learning could change this! The effectiveness of using machine learning for model-free prediction of chaotic systems has been documented in several documents! Jaideep Pathak of the University of Maryland, who shown that ML is a powerful tool for predicting chaos. 4 www.yurynino.com

Slide 18

Slide 18 text

Holger Kantz “The machine-learning technique is almost as good as knowing the truth, If we have ignorance we should use the machine learning to fill in the gaps where the ignorance resides. Edward Ott www.yurynino.com

Slide 19

Slide 19 text

3 Steps for Machine Learning Make the neural network learn the dynamics of the evolving flame! ML 1 The neural network essentially asks itself what will happen. Get Data Input Eg. measure the height of a flame at n different points. 2 3 Feed data-streams in to randomly artificial neurons Outputs are fed back in as new inputs. www.yurynino.com

Slide 20

Slide 20 text

3 Steps for Machine Learning Make the neural network learn the dynamics of the evolving flame! ML 1 The neural network essentially asks itself what will happen. Get Data Input Eg. measure the height of a flame at n different points. 2 3 Feed data-streams in to randomly artificial neurons Outputs are fed back in as new inputs. www.yurynino.com

Slide 21

Slide 21 text

3 Steps for Machine Learning Make the neural network learn the dynamics of the evolving flame! ML 1 The neural network essentially asks itself what will happen. Get Data Input Eg. measure the height of a flame at n different points. 2 3 Feed data-streams in to randomly artificial neurons Outputs are fed back in as new inputs. www.yurynino.com

Slide 22

Slide 22 text

3 Steps for Machine Learning Make the neural network learn the dynamics of the evolving flame! ML 1 The neural network essentially asks itself what will happen. Get Data Input Eg. measure the height of a flame at n different points. 2 3 Feed data-streams in to randomly artificial neurons Outputs are fed back in as new inputs. www.yurynino.com

Slide 23

Slide 23 text

Humans are defining machines to learn and constantly improve. So can we extrapolate this to predict our chaos!

Slide 24

Slide 24 text

Classifying Operations Failures Identify and measure the characteristics of a failure: ● Are we using an observability tool? ● Do we have observability about SLOs? ● Are we following an IM methodology? ● Are we writing postmortems? ● Did we practice gamedays? Get Data Input Eg. measure the height of a flame at n different points. 1 1 www.yurynino.com

Slide 25

Slide 25 text

Training Operations Failures Provide examples of previous projects: ● If we don’t use an observability tool the response time is ... ● If we cannot follow SLOs in an observability tool our KPIs are ... ● If we don’t have an IM methodology we are attending the same. Make the neural network learn the dynamics of the evolving flame! 2 2 www.yurynino.com

Slide 26

Slide 26 text

Predicting Operations Failures Ask to neural network: ● I lost the access to observability tool what will the impact in the response times ... ● If I write postmortems how could improve my response times … ● I will invest X money and time in a gameday next month what will my revenue ... The neural network essentially asks itself what will happen. 3 3 www.yurynino.com

Slide 27

Slide 27 text

NEXT: Challenges www.yurynino.com

Slide 28

Slide 28 text

Artificial intelligence is being used for providing resilience and reliability but is AI reliable a resilient? www.yurynino.com

Slide 29

Slide 29 text

Big Data is a form of AI. Organizations are using it to manage the customer experience, transform their products and deliver digital services. This data must be reliable! Challenge 1

Slide 30

Slide 30 text

Robust ML systems and hardware architectures are required to generate reliable and trustworthy results in the presence of hardware-level faults while also preserving security and privacy. Challenge 2

Slide 31

Slide 31 text

Natural Processing Language! Since computers exist people have tried to teach them how to process human language, however, the inconsistency and volatility of human language turns NLP into a complex task susceptible to fail. Challenge 3

Slide 32

Slide 32 text

The usual approach to predicting a chaotic system is to measure its conditions at one moment as accurately as possible, use these data to calibrate a physical model, and then evolve the model forward. Nataly Wolchover www.yurynino.com

Slide 33

Slide 33 text

Attack! Architectures & Models ● Test models injecting failures. ● Chaos as functions. ● Use Artificial Intelligence to classify postmortems! www.yurynino.com

Slide 34

Slide 34 text

Remember! Failures are an inevitable part of making software products and services, however it is not necessary to repeat the mistakes of the past. www.yurynino.com

Slide 35

Slide 35 text

We are here! Humans are defining the starting point for machines to learn and constantly improve. Self healing is the optimal way for systems to be operated and maintained. It is faster, more secure, and more reliable. Systems learn on their own how to prevent failures by, for instance, automatically scaling up capacity. www.yurynino.com

Slide 36

Slide 36 text

Artificial intelligence is being used for providing resilience and reliability but is AI reliable a resilient? www.yurynino.com

Slide 37

Slide 37 text

Thank you! @yurynino https://www.yurynino.com www.yurynino.com