CD for infrastructure

Continuous Deployment of infrastructure services Lindsay Holmwood

How to CD successfully 1. Optimise for fast feedback 2.
Chunk your changes

Background

CD vs CD

deploy to production acceptance tests integrate unit tests code done
Continuous Deployment Auto Auto Auto Auto deploy to production acceptance tests integrate unit tests code done Continuous Delivery Manual Auto Auto Auto

What is an infrastructure service?

• CI & • DNS & • Metrics storage &
• Database & • PaaS

What is Continuously Deployed infrastructure?

code change CI running service

The pipeline

test deploy build commit test

Executing the build: make build || ./cibuild.sh make deploy ||
./cideploy.sh

Push build definition into repo

The service

Decide what guarantees you are providing

• Consistency • Availability • Partition tolerance • Pick 2

Define your SLAs

Throughput

Availability

Data consistency

Examples:

95e response time for monitoring metric queries in a one
hour window is < 1 second.

No gaps in metric data returned from queries.

A single storage node failure does not result   in
unavailability.

Codify your SLAs as tests and checks

Know your data

How does data flow?

Eliminate the state

Define interfaces

Examples:

Requests to the API should conform to JSONAPI spec.

Metrics are received on UDP 25826 in collectd network protocol
format.

How to CD successfully 1. Optimise for fast feedback 2.
Chunk your changes

Making it fast

Fast feedback 1. Validate quickly 2. Limit technical debt 3.
Make it work, make it fast, make it right

Constantly identify & eliminate bottlenecks

Get iteration time down

test deploy build commit Measure individual duration test Measure total
duration

< 5 minutes

Track cycle time (min, max, median, 95e)

time(1)

Get CI close to the action

Eliminate latency

1. Does the thing exist? 2. Maybe make a change
3. Get info about the thing

API These steps are mandatory

Push all changes through CI

1 workflow

Divergence & Confidence

This will be painful! 1. (But worth it for audit
trail & reduced latency)

jenkins-cli from Netflix Skunkworks

git push && jenkins start && jenkins tail

One-off jobs

• Data fixups • Software patches • Benchmarks

Add one-off jobs to build or deploy steps

test deploy build commit test Inject one-off jobs here

Push build definition into repo

Chunking the changes

Chunk your changes 1. Change one, test one 2. Limit
WIP

Change 1, Test 1

Ordering matters

1. Change app 2.Change DB 3.Change proxy 4.Test app 5.Test
DB 6.Test proxy 4.Test app 5.Test DB 6.Test proxy

1. Change app 2.Test app 3.Change DB 4.Test DB 5.Change
proxy 6.Test proxy 1. Change app 2.Test app 3.Change DB 4.Test DB 5.Change proxy 6.Test proxy fail early

Tests finish quickly

< 10 seconds

1. Service running? 2. Can I do a simple query?
3. Obviously bad log messages? 4. Significant statistical deviation in metrics?

twitter/BreakoutDetection 1. Goodness of fit tests: 2. Kolmogorov-Smirnov 3. Kuiper’s
4. Anderson-Darling

Make feedback visual

1. checkout 2. build 3. test 4. notify Continuous Integration
Monitoring

Level up: Run tests constantly

{ "service": { "name": "collectd", "tags": [], "port": 25826, "checks":
[ … ] } }

{ "id": "coco-expvars", "name": "Coco expvars at :9090", "http": "http://127.0.0.1:9090/debug/vars",
"interval": "10s", "timeout": "1s" }, { "id": "lookup", "name": "Coco hash lookup at :9090", "http": "http://127.0.0.1:9090/lookup?name=hello", "interval": "10s", "timeout": "1s" }, { "id": "anomalous_coco_errors", "name": "anomalous_coco_errors", "script": "anomalous_coco_errors --host coco.example --window 10m", "interval": "10s", "timeout": "5s" }

1. Change app 2.Test app 3.Change DB 4.Test DB 5.Change
proxy 6.Test proxy

CD of apps vs CD of infrastructure

• Faster builds? • Apps: More compute " • Infrastructure:
Change IaaS

• Testing? • Apps: xUnit • Infrastructure: Serverspec & DIY

• Latency? • Apps: More compute, SSDs, memory. $ •
Infrastructure: Network proximity.

• Lingering state? • Apps: Database transactions. Wipe database contents.
• Infrastructure: Enjoy your complete rebuild.

• How to CD successfully • Optimise for fast feedback
• Chunk your changes • Constantly eliminate bottlenecks • Get iteration time down

I’m Lindsay @auxesis

Thank you! Questions? '()*+ the talk? Let @auxesis know!

CD for infrastructure

CD for infrastructure

More Decks by Lindsay Holmwood

Other Decks in Technology

Featured

Transcript