Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Chaos Engineering with Kubernetes

Introduction to Chaos Engineering with Kubernetes

Spiros Economakis

October 15, 2021
Tweet

More Decks by Spiros Economakis

Other Decks in Technology

Transcript

  1. Chaos Engineering with Kubernetes 10 Jun 2021 Spiros Economakis Site

    Reliability Engineering Lead @ Mattermost Cloud
  2. 4 “ Chaos doesn’t cause problems. It reveals them. ”

    Nora Jones, CEO of Jeli, (ex) Head of Chaos Engineering @Slack, @Netflix
  3. 5 Chaos Engineering is the discipline of experimenting on a

    system in order to build confidence in the system’s capability to withstand turbulent conditions in production. https://principlesofchaos.org/
  4. Chaos Toolkit 16 version: 1.0.0 title: Pod should be automatically

    killed and restarted when unhealthy description: Can we trust Kubernetes to restart our microservice when it detects it is unhealthy? tags: - microservice - kubernetes - python configuration: webapp_service_url: type: env key: WEBAPP_SERVICE_ADDR prometheus_base_url: type: env key: PROMETHEUS_ADDR
  5. Chaos Toolkit 17 steady-state-hypothesis : title: Services are all available

    and healthy probes: - type: probe name: all-services-are-healthy tolerance : true provider : type: python module: chaosk8s.probes func: all_microservices_healthy - type: probe name: webapp-is-available tolerance : true provider : type: python module: chaosk8s.probes func: microservice_available_and_healthy arguments : name: webapp-app
  6. Chaos Toolkit 18 method: - type: action name: talk-to-webapp background

    : true provider: type: process path: vegeta timeout: 63 arguments: attack: '' "-duration" : 60s "-connections" : '1' "-rate": '1' "-output": report.bin "-targets" : urls.txt - type: action name: confirm-purchase provider: type: http url: "${webapp_service_url}/purchase/confirm"
  7. Chaos Toolkit 19 - type: probe name: collect-how-many-times-our-service-container-restarted-in-the-last-minute provider: type:

    python module: chaosprometheus.probes func: query_interval arguments: query: kube_pod_container_status_restarts{container="webapp-app"} start: 2 minutes ago end: now pauses: before: 45 - type: probe name: read-webapp-logs-for-the-pod-that-was-killed provider: type: python module: chaosk8s.probes func: read_microservices_logs arguments: name: webapp-app from_previous : true
  8. 21 Fix & improve, otherwise Thanos will find you in

    an infinity loop of time travel