Upgrade to Pro — share decks privately, control downloads, hide ads and more …

You got a couple Microservices, now what? Adding SRE to DevOps

You got a couple Microservices, now what? Adding SRE to DevOps

This talk goes over the infrastructure needed to run Microservices in production by answers the following questions:

* Why do I want to run my software in Containers?
* What is a Kubernetes or Mesos?
* Am I going to need a DevOps or SRE team? What will they do?
* How will my Continuous Integration/Delivery will look like?

Gonzalo Maldonado

November 16, 2016
Tweet

More Decks by Gonzalo Maldonado

Other Decks in Technology

Transcript

  1. You got a couple
    Microservices, now what?
    Adding SRE to DevOps
    Gonzalo Maldonado - MustWin

    View Slide

  2. Microservice Honeymoon
    ^ Your microservice saved your homepage
    ^ Everyone loves working on the microservice
    2 months later
    ^ Is it still a microservice?
    ^ Why are we adding new stuff to the monolith?
    Can we get rid of ticket driven deployments?
    ^ What makes deploying a microservice so hard?
    ^ Where can we run this?
    ^ Monoliths seemed easier to maintain!
    ^ Datacenter 4.0
    ^ Dude, where's my container?
    ^ The promised land
    Sysadmin -> DevOps -> SRE
    ^ The SRE-cret Sauce
    ^ Resource & Container Management (Schedulers)
    ^ Service Discovery (Consul, Skydns & Etcd)
    TOC
    —Sysadmin -> DevOps -> SRE
    —Microservice Honeymoon
    —2 months later
    —Meanwhile your team is doing ticket driven
    deployments.
    —The SRE-cret Sauce
    —References

    View Slide

  3. Sysadmin -> DevOps -> SRE
    —SysAdmin: Manages 1 or 2 services manually.
    —DevOps Team: Manages ~10 services semi-
    programmatically.
    —SRE Team: Manages 100-1K services fully
    programmatically.

    View Slide

  4. Sysadmin -> DevOps -> SRE (Tech Stack)
    —SysAdmin: Bash, Perl or Python Scripts
    —DevOps Team: Chef, Puppet
    —SRE Team: Mesos, Swarm, Kubernetes, Consul,
    Vault

    View Slide

  5. We don't have 100 services, why should we care
    about the SRE tech stack?
    Because this stack:
    —Saves your team time configuring and deploying a
    service
    —Allows your engineering team to grow (a single
    engineer will be able to manage
    a couple dozen services)

    View Slide

  6. We don't have 100 services, why should we care
    about the SRE tech stack?
    Because this stack:
    —It prevents having to rewrite your infrastructure
    code as your app
    scales
    —It gives you elastic resources (Saves you money
    on aws).

    View Slide

  7. We don't have 100 services, why should we care
    about the SRE tech stack?
    —Because it makes deploying Microservices as
    easy as getting a heroku app up (and you used to
    love microservices).

    View Slide

  8. When doing Microservices gets
    hard

    View Slide

  9. The Microservice
    Honeymoon: how a
    microservice saved your
    homepage
    Microservices are awesome.

    View Slide

  10. The Microservice Honeymoon: how a microservice
    saved your homepage
    —Your page loads decreased from 3 seconds to
    20ms (Go is so fast!)

    View Slide

  11. The Microservice Honeymoon: how a microservice
    saved your homepage
    —Hacker News spikes are no longer a big deal
    (we're elastic!)

    View Slide

  12. The Microservice Honeymoon: how a microservice
    saved your homepage
    —Everyone loves working on the Microservice (It's
    only 500 lines!)

    View Slide

  13. 2 months later...

    View Slide

  14. 2 months later...
    —If it has 2K lines of code, is it still a microservice?

    View Slide

  15. 2 months later...
    —Why are people still adding stuff to the monolith?
    —The code is already there and they didn't want
    to rewrite it (duh.)
    —Debugging things is getting harder (You need to
    test in multiple
    places)
    —Getting a new microservice to prod is hard! (!
    This.)

    View Slide

  16. Why is creating new Microservices so hard now?
    (monoliths felt easier)
    "Awesome analogy by @timallenwagner: monolithic
    architecture=carrying a 7ft beach ball,
    microservice=carrying 200 loose marbles"

    View Slide

  17. Why is creating new Microservices so hard now?
    (monoliths felt easier)
    —Configuration Management (You have to repeat
    recipes)
    —Service-inter-dependency-updates (You can't
    change a service address
    or port without affecting other services)
    —Credentials cannot be shared
    —Snowflake Runtime Environments (Can't run
    node.js code on the JVM box)

    View Slide

  18. Meanwhile, your team is doing ticket driven
    deployments
    —Deploys have become more complicated, when
    there was only a Monolith,
    you only had one deploy, and one box.

    View Slide

  19. Meanwhile, your team is doing ticket driven
    deployments
    —It has gotten to a point, where your team has
    decided they "need a
    ticket" for each deploy

    View Slide

  20. Where can we run this? Your Sys Admin asks...
    —If you're typing apt-get to get a new environment
    up, you're doing
    something wrong.
    —Chef, Puppet, Ansible are good replacements, but
    there's something
    better you probably already use on your dev
    machine.

    View Slide

  21. Your Datacenter has to
    change

    View Slide

  22. Datacenter 1.0 1
    "How do we use these machines?"
    "Can we automate?"
    "How can we integrate?"
    1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

    View Slide

  23. Datacenter 2.0 1
    "We need bigger computers"
    "We need a microservice"
    "We need a SysAdmin"
    1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

    View Slide

  24. Datacenter 3.0 1
    "We need some VMS."
    "We need microservices"
    "We need IT"
    1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

    View Slide

  25. Datacenter 3.5 1
    "We have a lot of VMs"
    "We have lots of microservices"
    "We need DevOps"
    1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

    View Slide

  26. Datacenter 3.5 1
    "We need to manage our VMs"
    "We need to manage our
    microservices"
    "We need SREs"
    1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

    View Slide

  27. You already heard about docker and why
    using containers that share OS
    resources is more efficient
    than using full virtual machines. But what
    else does docker give
    you?
    Dude, where is my container?
    Virtual Machines vs Docker

    View Slide

  28. Dude, where is my container?
    What else does docker give you?
    * Contained instances (You can run multiple
    runtimes on one box)
    * Incremental images. (You can use an existing
    image as a base)
    * Immutable Instances (Your images are stateless)

    View Slide

  29. And this gets us to The Lean Staging
    $ git commit -am "The new cool feature"

    View Slide

  30. The Lean Staging
    $ git commit -am "The new cool feature"
    $ git push

    View Slide

  31. The Lean Staging
    $ git commit -am "The new cool feature"
    $ git push
    Running CI ...........................

    View Slide

  32. The Lean Staging
    $ git commit -am "The new cool feature"
    $ git push
    Running CI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
    CI done. Your branch is available at http://super-tesla.thunderdomes.co

    View Slide

  33. What do we need to get
    there?
    We need Service Reliability
    Engineers

    View Slide

  34. What do we need to get
    there?
    And what are those SRE guys
    going to build to achieve that?

    View Slide

  35. What do we need to get there?
    What's the SRE-cret sauce?
    —Code
    —Servers or The Cloud
    —A CI service
    —A deployment system

    View Slide

  36. What do we need to get there?
    Aka. The SRE-cret sauce
    —Code (We already have that)
    —Servers or The Cloud (Pick AWS, GCP or Azure)
    —A CI service (Pick Jenkins, Travis or CircleCI)
    —Deployment & Monitoring systems ! Lets Focus
    on this

    View Slide

  37. The SRE-cret sauce.
    Those Deployment systems will do the following
    a. Container Management
    b. Service Discovery
    c. Configuration Management
    d. Authentication & Authorization

    View Slide

  38. This is what you want. Now that you have
    discovered Docker, you want to us it on production.
    While you could run all your containers on a single
    box, this would
    prevent you to scale horizontally, and you would
    need downtime to add
    more memory to that box.
    Container Management

    View Slide

  39. Like many things on the tech world, Google was one of the
    early
    adopters of Schedulers. Schedulers are systems in charge
    of
    managing the cluster resources by telling applications when
    to run.
    Container Management: Enter the scheduler
    Architectures presented in the white-paper
    concerning
    Google's Omega Scheduler.

    View Slide

  40. Container Management: Scheduler Options
    —Mesosphere DCOS (Based on Apache Mesos)
    —Docker Swarm
    —Kubernetes
    —Nomad

    View Slide

  41. Each scheduler option has it's own pros and
    cons and you will need to
    pick the one that better fits your team needs.

    View Slide

  42. Container Management: Scheduler Options[^2]
    More info here:
    https://medium.com/@mustwin/a-handy-guide-to-
    the-mesos-kubernetes-swarm-jungle-
    ad6bc086c736#.6ji95fm7e

    View Slide

  43. Service Discovery
    Service discovery is a mechanism in when adding a
    new service instance,
    the rest of the services detect this change
    automatically.

    View Slide

  44. Service Discovery: Options
    Load balancer + Highly available Storage.
    Using a load balancer like NGINX/HAProxy + etcd
    you can update service
    registrations dynamically. The Load Balancer takes
    care of DNS
    resolutions.

    View Slide

  45. Service Discovery: Options
    Etcd + Skydns
    SkyDNS performance is comparable to HAProxy, but
    it's easier to setup
    although not as powerful

    View Slide

  46. Service Discovery: Options
    Consul
    Consul is a key/value & service registry with built in
    DNS support.

    View Slide

  47. Service Discovery: How to pick?
    a. Pick a scheduler
    * Kubernetes currently only supports etcd.
    * Mesos can use Etcd, Zookeeper or Consul.
    b. If you're using Consul you're done.
    c. For etcd:
    * Use HAProxy if you're already using it
    * Otherwise just use Skydns and call it a day

    View Slide

  48. Configuration Management
    We're going to assume your Microservices are
    already 12 Factor apps3.
    Where:
    * Service configuration happens in Environment
    variables
    * Backing services are attached resources (Service
    Discovery FTW)
    3 https://12factor.net/

    View Slide

  49. Configuration Management (Options)
    Most schedulers support this out of the box, with
    the caveat that most
    don't provide Secret management out of the box
    (K8s does).

    View Slide

  50. Secret Managment (Vault)
    For secret management we cannot recommend
    more Vault because it
    provides:
    —Secure secret storage
    —Dynamic Secrets
    —Leasing and Renewal
    —Revocation
    —Auditing
    —Etc.

    View Slide

  51. Other things you will need
    —Monitoring: (Prometheus, Nagios, InfluxDB,
    Grafana)
    —An authentication Service or provider

    View Slide

  52. To Recap. To build The Lean Staging we will need:
    —Setup a Scheduler (Kubernetes)
    —Setup a CI System (Drone, Jenkins or Travis)
    —Hook your Github/Gitlab to that CI
    —Change the CI configuration to trigger a Container
    build & Deploy
    —Have fun!

    View Slide

  53. Gitlab made a really good proof of
    concept of it
    https://about.gitlab.com/
    2016/11/14/idea-to-production/

    View Slide

  54. Recommended reading for SRE Teams:
    Distributed Systems fundamentals:
    —Notes on Distributed Systems for Young Bloods -
    Jeff Hodges
    —You Can’t Sacrifice Partition Tolerance - Coda
    Hale
    —The Raft Consensus Algorithm - Diego Ongaro

    View Slide

  55. Recommended reading for SRE Teams:
    Microservices
    —Building Microservices - Sam Newman
    SRE
    —Site Reliability Engineering - Beyer, et al.
    —Continuous Delivery - Jez Humble
    —The Principles of Product Development Flow -
    Reinertsen

    View Slide

  56. https://medium.com/@mustwin/a-handy-guide-to-the-mesos-kubernetes-swarm-
    jungle-ad6bc086c736#.a2mymzvsi
    ^ https://medium.com/@ArmandGrillet/comparison-of-container-schedulers-
    c427f4f7421#.uxtk80w35
    ^ https://about.gitlab.com/2016/11/14/idea-to-production/
    ^ https://about.gitlab.com/2016/09/14/gitlab-live-event-recap/
    ^ https://signalfx.com/library/slides-operationalizing-docker-scale-microservices-
    orchestration-zenefits/
    ^ https://medium.com/@mattheath/a-long-journey-into-a-microservice-world-
    a714992d2841#.jluhzvs34
    ^ https://engineering.zenefits.com/2016/09/sauron-ci-automation-at-zenefits/
    ^ https://news.ycombinator.com/item?id=12880917
    ^ http://patrobinson.github.io/2016/11/05/docker-in-production/
    ^ https://thehftguy.wordpress.com/2016/11/01/docker-in-production-an-history-of-
    failure/
    ^ https://medium.com/google-cloud/a-survival-guide-for-containerizing-your-
    infrastructure-part-1-why-switch-8e8dee9fc66#.sr5nct3p3
    ^ https://www.youtube.com/watch?v=WiCru2zIWWs
    ^ https://speakerdeck.com/mattheath/microservices-and-go-goto-copenhagen-2016
    References
    —https://medium.com/@mustwin/a-handy-guide-to-
    the-mesos-kubernetes-swarm-jungle-
    ad6bc086c736#.a2mymzvsi
    —https://medium.com/@ArmandGrillet/comparison-
    of-container-schedulers-c427f4f7421#.uxtk80w35
    —https://about.gitlab.com/2016/11/14/idea-to-
    production/
    —https://about.gitlab.com/2016/09/14/gitlab-live-
    event-recap/

    View Slide

  57. Questions?
    Slides will be posted at
    medium.com/@mustwin

    View Slide