Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Supporting SRE with Azure Services

Supporting SRE with Azure Services

Yury Nino

April 17, 2021
Tweet

More Decks by Yury Nino

Other Decks in Technology

Transcript

  1. Agenda • What is SRE • SRE Principles ◦ Azure

    Services • SRE Practices ◦ Azure Services • Resources www.ingenieriadelcaos.com .
  2. Those systems were not RELIABLE! Reliability describes the ability of

    a system or component to function under stated conditions for a specified period of time.
  3. Site Reliability Engineering is what you get when you treat

    operations as if it’s a software problem. Our mission is to protect, provide for, and progress the software and systems with an ever-watchful eye on their availability, latency, performance, and capacity.
  4. 7 SRE Principles Embracing Risk Service Level Objectives Eliminating Toiling

    Monitoring Automation Release Engineering Simplicity www.ingenieriadelcaos.com .
  5. 7 SRE Principles Embracing Risk Service Level Objectives Eliminating Toiling

    Monitoring Automation Release Engineering Simplicity www.ingenieriadelcaos.com .
  6. Monitoring Azure Management Portal Azure provides dashboards that display specific

    metrics for services that not be available through APM tools or profilers. For example, the average latency of Azure Storage, or the rate for messaging within an Azure IoT Hub. Azure Services Azure helps to maximize the availability and performance of applications and services. Azure delivers a solution for collecting, analyzing, and acting on telemetry from cloud and on-premises environments. www.ingenieriadelcaos.com .
  7. 7 SRE Principles Embracing Risk Service Level Objectives Eliminating Toiling

    Monitoring Automation Release Engineering Simplicity www.ingenieriadelcaos.com .
  8. Automation Automate ALL the Things! • User account creation. •

    Software or hardware installation preparation • Runtime configuration changes. • Rollouts of new software versions. Automate yourself? Automate the Resilience! www.ingenieriadelcaos.com .
  9. Automation www.ingenieriadelcaos.com . Chaos Engineering It is the discipline of

    experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. https://principlesofchaos.org
  10. Eliminating Toiling Azure ARM Templates Azure Resource Manager templates (ARM

    templates) • ARM is Infrastructure as Code for Azure. • A template is a JavaScript Object Notation (JSON) file. • A template uses declarative syntax, which lets you state what you intend to deploy without having to write the sequence of programming commands to create it. • A template specify the resources to deploy and the properties for those resources www.ingenieriadelcaos.com .
  11. 7 SRE Principles Embracing Risk Service Level Objectives Eliminating Toiling

    Monitoring Automation Release Engineering Simplicity www.ingenieriadelcaos.com .
  12. Embracing Risk Continuous Build and Deployment! Release engineers have a

    solid understanding of source code management, compilers, build configuration languages, automated build tools, package managers, and installers. 4 principles: Self-Service Model, High Velocity, Hermetic Builds, Enforcement Policies and Procedures. You expect to build 100% reliable services—ones that never fail. However, increasing reliability is worse for a service rather than better! Extreme reliability comes at a cost! Embrace the Risk! www.ingenieriadelcaos.com .
  13. Release Engineering Azure Staging slots One way to implement blue-green

    deployments is using the staging slots available in Azure App Service to stage a deployment before moving it to production. www.ingenieriadelcaos.com .
  14. 18 SRE Practices Effective Troubleshooting Practical Alerting Emergency Response Being

    On-Call Postmortems Managing Incidents Tracking Outages Testing for Reliability www.ingenieriadelcaos.com .
  15. Practical Alerting Azure Alerts Using Azure Monitor it is possible

    to generate alert rules that captures the target and criteria for alerting. Target Resource Defines the scope and signals available for alerting. • Virtual machines. • Storage accounts. • Log Analytics workspace. • Application Insights. Signals can be of the following types: metric, activity log, Application Insights, and log. www.ingenieriadelcaos.com .
  16. 18 SRE Practices Effective Troubleshooting Practical Alerting Emergency Response Being

    On-Call Postmortems Managing Incidents Tracking Outages Testing for Reliability www.ingenieriadelcaos.com .
  17. Preparation • Training and support, general knowledge management • Needs

    constant iteration Analysis & Learning • Understand what happened and build new action plans and training • Siloed apps for reporting and analysis make it hard to summarize incidents, delaying post mortems Detection & Alerting • Monitoring systems send alerts for issues which need to be reviewed and escalated • Alerting systems generate noise which is lost in email, unclear where to escalate Remediation • Resolve the issues • Multiple tools, people and processes need to be coordinated for the implementation of long-term solutions for incidents. Process Overview & Challenges Containment • Data must be reviewed and the current situation assessed • Damage has to be contained quickly and efficiently to minimize the impact. Incident Management
  18. Incident Management Azure Security Incident Management It is a critical

    responsibility for Microsoft and represents an investment that any customer using Microsoft Online Services can count on. Azure implements the five-stage process. Detect Assess Diagnose Stabilize Close www.ingenieriadelcaos.com .
  19. 18 SRE Practices Effective Troubleshooting Practical Alerting Emergency Response Being

    On-Call Postmortems Managing Incidents Tracking Outages Testing for Reliability
  20. 18 SRE Practices Handling Overload Software Engineering Cascading Failures Load

    Balancing Distributed Consensus Data Pipelines Data Integrity Distributed Scheduling www.ingenieriadelcaos.com .
  21. 18 SRE Practices Handling Overload Software Engineering Cascading Failures Load

    Balancing Distributed Consensus Data Pipelines Data Integrity Distributed Scheduling www.ingenieriadelcaos.com .
  22. Load Balancing Azure Load Balancer Azure delivers high availability and

    network performance: • Instantly add scale to your applications • Load balance Internet and private network traffic • Improve application reliability via health checks • Flexible NAT rules for better security • Directly integrated into virtual machines and cloud services www.ingenieriadelcaos.com .
  23. 18 SRE Practices Handling Overload Software Engineering Cascading Failures Load

    Balancing Distributed Consensus Data Pipelines Data Integrity Distributed Scheduling www.ingenieriadelcaos.com .
  24. 18 SRE Practices Handling Overload Software Engineering Cascading Failures Load

    Balancing Distributed Consensus Data Pipelines Data Integrity Distributed Scheduling
  25. Other Services DEMO Azure IaC con Terraform Bondades de la

    IaC Principales opciones: terrafor, Pulumi, Crossplane, Azure Resource Manager, CLI, entre otros…. www.ingenieriadelcaos.com .
  26. Swag and more • Claim your attendee Learner Badge here:

    • 30 Days to learn it: aka.ms/global-azure/30D2L • Virtual background and ANOTHER Badge: blog.globalazure.net/Swag