Improving Customer Experience through Infrastructure Automation

675e2b6f653233a3a4d4e04f34610e1d?s=47 Brandon Burton
September 30, 2016

Improving Customer Experience through Infrastructure Automation

As many of us know, automation is one of the cornerstones of cultivating a "DevOps culture." We've seen how automation helps improve the lives of operations and development folks. But, a "DevOps culture" is also about seeing the business as a whole and how to make "operations" work be seen as critical and important part of the business value chain. We should be thinking about how to directly link our infrastructure automation initiatives back to large goals and objectives that improve the customer experience.

This talk will share some of the key automation objectives the build infrastructure engineering group at Travis CI is doing, the process and challenges we've encountered we figure out how to incorporate the larger focus into work planning, and what's being done to measure the actual customer impact of our new infrastructure automation changes.

675e2b6f653233a3a4d4e04f34610e1d?s=128

Brandon Burton

September 30, 2016
Tweet

Transcript

  1. Improving Customer 
 Experience though 
 Infrastructure Automation Brandon Burton

    @solarce Travis CI travis-ci.org
  2. greetings (thx joe)

  3. who am I?

  4. Brandon Burton Engineering Manager Build Infrastructure Travis CI @solarce

  5. also memes

  6. also memes

  7. also memes

  8. also memes

  9. 
 infrastructure automation?

  10. Tools?

  11. Tools! Chef Terraform Packer Docker Kubernetes, Mesos, Swarm, Nomad

  12. Tools!

  13. What problems are we solving?

  14. We want to make things better

  15. But better for who?

  16. Ops? Devs? Sales? Finance? Support? Users? Paying Users? Free Users?

  17. Unconscious constraints?

  18. Unconscious constraints?

  19. cultivate a holistic view of the desired outcome of our

    automation?
  20. grow a product view?

  21. At Travis CI?

  22. our context and constraints

  23. we manage compute environments build execution build env images

  24. compute aws ec2 google cloud engine vCenter/vSphere

  25. execution backend services that create the VM/container run build over

    SSH destroy VM/container
  26. build environments linux osx

  27. linux ubuntu 12.04 and 14.04 VMs (GCE) Containers (Docker on

    EC2)
  28. osx 10.9, 10.10, 10.11, 10.12 Xcode 6.[1,2,3,4] Xcode 7.[1,2,3] Xcode

    8.0, 8.1b vSphere VMs
  29. trying to apply the holistic view?

  30. asking ourselves: how do we decide what to do when?

  31. Because, business goals can often conflict with what some users

    want
  32. What users want can often conflict amongst different types of

    users
  33. What users want can often conflict amongst different types of

    users
  34. When we get feedback from users about our build environments

  35. we hear that they want many things

  36. Build environments that
 are up to date

  37. But also have stability
 and predictability

  38. While retaining the flexibility to customize the environment

  39. None
  40. two ways we are trying to apply this: build env

    maintenance build execution start times
  41. build env maintenance customer want safe and reliable change

  42. build env maintenance new OS OS updates language updates service

    updates user-land updates
  43. giving a better build environment experience for our users?

  44. packer builds running under travis templates are open source users

    can open issues we open issues on behalf of users what we're doing
  45. packer runs chef (bake the image) our chef repo is

    open source users (already) contribute fixes and updates to our chef cookbooks what we're doing
  46. added serverspec testing tests pass? packer publishes artifact build passes?

    register artifact for opt-in testing group: edge what we're doing
  47. still to be done? more integration testing better unit testing

    make it easier for external contributions to chef cookbooks packer templates get OS X under Packer and Chef and not ./doit5.sh commit to release schedule for updates, e.g. stable: quarterly rc: month edge: if CI passes
  48. more frequent updates, faster build times more confidence that updates

    won't break their builds, builds trust users are able to more directly impact future changes what could it mean for users?
  49. improved reliability growth of trust better consistency more user engagement

    faster builds! how would we described this in terms of user impact?
  50. build execution start time

  51. constraint: (today) VM creation is part of the build lifecycle

    users have to wait on it right boot times can be slow and can be highly variable in the GCE and vSphere
  52. how can we improve the time to build execution start?

    (from the user's perspective)
  53. rub some auto-scaling on it?

  54. building an auto-scaler?

  55. building an auto-scaler? YES! WHY? existing metrics experience using other

    auto-scaling products experience making our own services to extend cloud APIs
  56. auto-scaler needs ̣ maintains pool of ready VMs based on

    VM image usage metrics ̣ can take time windows into account for headroom calculations ̣ v1 should be simple and naive ̣ support multiple compute environments ̣ EC2, GCE, vSphere, etc ̣ mature life-cycle hook support
  57. bespoke auto-scaler benefits? ̣ cloud agnostic ̣reduces user impact for

    types of failures ̣ enables user contributions
  58. we want every customer build starts with 20-30s of their

    `git push` described with user impact?
  59. faster build times improves feedback loop for users inspire customers

    to test more existing code and new code described with user impact?
  60. we've seen success in adapting to existing plans we've seen

    success in making future plans this way we try to improve incrementally we are ok with having a long way to go still in conclusion
  61. None
  62. find me on twitter: @solarce questions, feedback, stories of failure/success

    with these ideas? Travis CI travis-ci.org