Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CD for infrastructure

CD for infrastructure

If you're doing devops, you're almost certainly doing CD. Developers get their changes in front of customers quicker, operations have consistency, repeatability, and auditability, and the business gets measurable value, faster.

But most of the literature and case studies on CD are around applications. What about infrastructure? The evidence clearly shows that CD of applications reduces failures and increases reliability, but operations teams generally don't manage infrastructure services like PaaS or monitoring with CD - and they're missing out on the same benefits.

For the last 6 months, we have been building our next generation monitoring metrics storage platform to handle over 350,000 metrics updated every 10 seconds, with at least 6 months retention.

Because monitoring infrastructure needs to be at least as reliable (if not more reliable) than the things being monitored, we opted to design this infrastructure service to be Continuous Deployed. By designing CD-first, we have been able to adapt to changing business and engineering requirements, and scale the infrastructure faster and more reliably than would have otherwise been possible.

In this talk we'll cover a toolchain for Continuously Deploying changes to infrastructure services, compare and contrast the CD challenges for applications vs infrastructure, and explore how to get good feedback on changes to infrastructure that compliments and assists monitoring.

Lindsay Holmwood

July 16, 2015
Tweet

More Decks by Lindsay Holmwood

Other Decks in Technology

Transcript

  1. deploy to production acceptance tests integrate unit tests code done

    Continuous Deployment Auto Auto Auto Auto deploy to production acceptance tests integrate unit tests code done Continuous Delivery Manual Auto Auto Auto
  2. API

  3. API

  4. Fast feedback 1. Validate quickly 2. Limit technical debt 3.

    Make it work, make it fast, make it right
  5. API

  6. API

  7. API

  8. 1. Change app 2.Change DB 3.Change proxy 4.Test app 5.Test

    DB 6.Test proxy 4.Test app 5.Test DB 6.Test proxy
  9. 1. Change app 2.Test app 3.Change DB 4.Test DB 5.Change

    proxy 6.Test proxy 1. Change app 2.Test app 3.Change DB 4.Test DB 5.Change proxy 6.Test proxy fail early
  10. 1. Service running? 2. Can I do a simple query?

    3. Obviously bad log messages? 4. Significant statistical deviation in metrics?
  11. { "id": "coco-expvars", "name": "Coco expvars at :9090", "http": "http://127.0.0.1:9090/debug/vars",

    "interval": "10s", "timeout": "1s" }, { "id": "lookup", "name": "Coco hash lookup at :9090", "http": "http://127.0.0.1:9090/lookup?name=hello", "interval": "10s", "timeout": "1s" }, { "id": "anomalous_coco_errors", "name": "anomalous_coco_errors", "script": "anomalous_coco_errors --host coco.example --window 10m", "interval": "10s", "timeout": "5s" }
  12. • How to CD successfully • Optimise for fast feedback

    • Chunk your changes • Constantly eliminate bottlenecks • Get iteration time down