Improving Customer Experience through Infrastructure Automation

Slide 1

Slide 1 text

Improving Customer   Experience though   Infrastructure Automation Brandon Burton @solarce Travis CI travis-ci.org

Slide 2

Slide 2 text

greetings (thx joe)

Slide 3

Slide 3 text

who am I?

Slide 4

Slide 4 text

Brandon Burton Engineering Manager Build Infrastructure Travis CI @solarce

Slide 5

Slide 5 text

also memes

Slide 6

Slide 6 text

also memes

Slide 7

Slide 7 text

also memes

Slide 8

Slide 8 text

also memes

Slide 9

Slide 9 text

  infrastructure automation?

Slide 10

Slide 10 text

Tools?

Slide 11

Slide 11 text

Tools! Chef Terraform Packer Docker Kubernetes, Mesos, Swarm, Nomad

Slide 12

Slide 12 text

Tools!

Slide 13

Slide 13 text

What problems are we solving?

Slide 14

Slide 14 text

We want to make things better

Slide 15

Slide 15 text

But better for who?

Slide 16

Slide 16 text

Ops? Devs? Sales? Finance? Support? Users? Paying Users? Free Users?

Slide 17

Slide 17 text

Unconscious constraints?

Slide 18

Slide 18 text

Unconscious constraints?

Slide 19

Slide 19 text

cultivate a holistic view of the desired outcome of our automation?

Slide 20

Slide 20 text

grow a product view?

Slide 21

Slide 21 text

At Travis CI?

Slide 22

Slide 22 text

our context and constraints

Slide 23

Slide 23 text

we manage compute environments build execution build env images

Slide 24

Slide 24 text

compute aws ec2 google cloud engine vCenter/vSphere

Slide 25

Slide 25 text

execution backend services that create the VM/container run build over SSH destroy VM/container

Slide 26

Slide 26 text

build environments linux osx

Slide 27

Slide 27 text

linux ubuntu 12.04 and 14.04 VMs (GCE) Containers (Docker on EC2)

Slide 28

Slide 28 text

osx 10.9, 10.10, 10.11, 10.12 Xcode 6.[1,2,3,4] Xcode 7.[1,2,3] Xcode 8.0, 8.1b vSphere VMs

Slide 29

Slide 29 text

trying to apply the holistic view?

Slide 30

Slide 30 text

asking ourselves: how do we decide what to do when?

Slide 31

Slide 31 text

Because, business goals can often conflict with what some users want

Slide 32

Slide 32 text

What users want can often conflict amongst diﬀerent types of users

Slide 33

Slide 33 text

What users want can often conflict amongst diﬀerent types of users

Slide 34

Slide 34 text

When we get feedback from users about our build environments

Slide 35

Slide 35 text

we hear that they want many things

Slide 36

Slide 36 text

Build environments that  are up to date

Slide 37

Slide 37 text

But also have stability  and predictability

Slide 38

Slide 38 text

While retaining the flexibility to customize the environment

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

two ways we are trying to apply this: build env maintenance build execution start times

Slide 41

Slide 41 text

build env maintenance customer want safe and reliable change

Slide 42

Slide 42 text

build env maintenance new OS OS updates language updates service updates user-land updates

Slide 43

Slide 43 text

giving a better build environment experience for our users?

Slide 44

Slide 44 text

packer builds running under travis templates are open source users can open issues we open issues on behalf of users what we're doing

Slide 45

Slide 45 text

packer runs chef (bake the image) our chef repo is open source users (already) contribute fixes and updates to our chef cookbooks what we're doing

Slide 46

Slide 46 text

added serverspec testing tests pass? packer publishes artifact build passes? register artifact for opt-in testing group: edge what we're doing

Slide 47

Slide 47 text

still to be done? more integration testing better unit testing make it easier for external contributions to chef cookbooks packer templates get OS X under Packer and Chef and not ./doit5.sh commit to release schedule for updates, e.g. stable: quarterly rc: month edge: if CI passes

Slide 48

Slide 48 text

more frequent updates, faster build times more confidence that updates won't break their builds, builds trust users are able to more directly impact future changes what could it mean for users?

Slide 49

Slide 49 text

improved reliability growth of trust better consistency more user engagement faster builds! how would we described this in terms of user impact?

Slide 50

Slide 50 text

build execution start time

Slide 51

Slide 51 text

constraint: (today) VM creation is part of the build lifecycle users have to wait on it right boot times can be slow and can be highly variable in the GCE and vSphere

Slide 52

Slide 52 text

how can we improve the time to build execution start? (from the user's perspective)

Slide 53

Slide 53 text

rub some auto-scaling on it?

Slide 54

Slide 54 text

building an auto-scaler?

Slide 55

Slide 55 text

building an auto-scaler? YES! WHY? existing metrics experience using other auto-scaling products experience making our own services to extend cloud APIs

Slide 56

Slide 56 text

auto-scaler needs ̣ maintains pool of ready VMs based on VM image usage metrics ̣ can take time windows into account for headroom calculations ̣ v1 should be simple and naive ̣ support multiple compute environments ̣ EC2, GCE, vSphere, etc ̣ mature life-cycle hook support

Slide 57

Slide 57 text

bespoke auto-scaler benefits? ̣ cloud agnostic ̣reduces user impact for types of failures ̣ enables user contributions

Slide 58

Slide 58 text

we want every customer build starts with 20-30s of their `git push` described with user impact?

Slide 59

Slide 59 text

faster build times improves feedback loop for users inspire customers to test more existing code and new code described with user impact?

Slide 60

Slide 60 text

we've seen success in adapting to existing plans we've seen success in making future plans this way we try to improve incrementally we are ok with having a long way to go still in conclusion

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

find me on twitter: @solarce questions, feedback, stories of failure/success with these ideas? Travis CI travis-ci.org