The Next Wave of Reliability Engineering (Interop ITX 2018)

The Next Wave of Reliability Engineering Michael Kehoe Staff Site
Reliability Engineer

Today’s agenda 1 Introductions 2 Where have we come from
3 What is Reliability Engineering 4 Where are we going 5 The Future of Reliability Engineering 6 Key Takeaways 7 Q&A

Introduction

Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @
LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Former Network Engineer at the University of Queensland

Production-SRE Team @ LinkedIn $ WHOAMI • Disaster Recovery -
Planning & Automation • Incident Response – Process & Automation • Visibility Engineering – Making use of operational data • Reliability Principles – Defining best practice & automating it

Where have we come from

Development/ Operations Bottlenecks Traditional • Department Silo’s • Slow release
cycle’s • High toil workloads • Poor operational visibility Where have we come from

What is Reliability Engineering

“What happens when a software engineer is tasked with what
used to be called operations” B E N T R E Y N O R S L O S S

“Helping Product and Engineering deliver the best experience possible for
the end user from an operations perspective ”

What is Reliability Engineering

DevOps Concepts Operational silos Reduce Everything Measure Failure as normal
Accept Gradual changes Implement Tooling and automation Leverage

Operational Silos Reduce • Shared ownership of code & infrastructure
• Sharing of tools • Expectation of collaboration DevOps Concepts

Failure as Normal Accept • Expect & embrace risk •
Quantify failure via SLO’s • Blameless postmortem DevOps Concepts

Gradual Change Implement • Encourage organization to move quickly •
Lower the cost of failure • Manage Risk DevOps Concepts

Tooling and Automation Leverage • Automate toil away • Reduce
‘Human Touch’ DevOps Concepts

Everything Measure • Measure all aspects of systems • Availability
• Errors • Incident statistics DevOps Concepts

Where are we going?

Where are we going? Agility Increased Everything Measure Is the
new normal Failure Is Ubiqitous Automation In Depth Observe

The Next Wave of Reliability Engineering

The Future of Reliability Engineering Of the Network Engineer Evolution
And measure Observe Is the new normal Failure As a Service Automation Is king Cloud

Making the network follow SRE practices Dawn of the Network
Reliability Engineer https://forums.juniper.net/t5/SDN-and-NFV-Era/2018-and-the-Dawn-of-Network-Reliability-Engineering-NRE/ba-p/316915

Of Network Automation Evolution 1. Manual Operations 2. Automation 3.
Visibility & Visualization 4. Data Analysis & realization 5. Reactive, Predictive Self Operation Credit: Greg Ferro (Packet Pushers) http://packetpushers.net/taxonomy-five-levels-intent-based- networking-beta/

Downgrade failures from exceptional to expected Failure is the new
Normal https://azure.microsoft.com/en-us/blog/inside-azure-search-chaos-engineering/

Is the new normal Failure • Accept failure as normal
• Test for failure: • Application • Local Infrastructure • Global Infrastructure • Continuous experimentation

Automation & Orchestration will be a part of all systems
Automation as a Service

Is ubiquitous Automation • Automation is expected • Automation is
unified • No more one-off scripts • Automation extends to monitoring, triage & automation • Automation drives down: • Time to Detect • Time to Resolve

Applications are built for the cloud Cloud is King https://woodby.com/pricing-plans

Is King Cloud • Adoption of Private & Public Clouds
will continue • Most infrastructure will be ephemeral • Applications will be engineered to be ‘Cloud Native’ • Engineering agility will continue to increase

Making the most of operational data Observe & Measure https://www.acronis.com/en-us/blog/posts/web-application-monitoring-basic-framework

And measure Observe • Machine driven triaging using tracing and
advanced learning • Advanced analytics on performance to drive infrastructure optimization • Use of incident data to drive feedback loops

Key Takeaways

Key Takeawys DEVOPS CONCEPTS Operational silos Reduce Everything Measure Failure
as normal Accept Gradual change Implement Tooling and automation Leverage

Key Takeaways THE FUTURE OF RELIABILITY ENGINEERING Of the Network
Engineer Evolution And measure Observe Is the new normal Failure Is ubiquitous Automation Is king Cloud

The Next Wave of Reliability Engineering (Inter...

The Next Wave of Reliability Engineering (Interop ITX 2018)

More Decks by Michael

Other Decks in Technology

Featured

Transcript