Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SUCCESSFUL DEVOPS IMPLEMENTATION FOR SMALL TEAMS: A TRUE STORY

SUCCESSFUL DEVOPS IMPLEMENTATION FOR SMALL TEAMS: A TRUE STORY

A case study of a successful "DevOps" transformation in a small company with help from Kubernetes and Cloud Native software pack. Is Kubernetes an answer to all developers problems? Can they implement Kubernetes / Cloud Native software without sysadmin help? Does it even benefit them? We will answer that, showing how we managed to introduce Cloud Native technology and Infrastructure as Code to a small team of developers, how it impacted our velocity, availability and ability to scale.

Jakub Paweł Głazik

March 19, 2019
Tweet

Other Decks in Programming

Transcript

  1. Definition “A set of practices intended to reduce the time

    between committing a change to a system and the change being placed into normal production, while ensuring high quality” Bass, Len; Weber, Ingo; Zhu, Liming. DevOps: A Software Architect's Perspective. ISBN 978-0134049847.
  2. It’s not a new thing already established in the industry

    - tons of job offerings confirm that • automation, automation, automation • containers eeeeverywheeereee • The Cloud (i.e. someone else’s computer)
  3. Simple test: “are you DevOps level over 9000?” • your

    answer for “how many servers do you have?” is “I have to check..” • you do multiple production deployments each day • your dev team can create new (micro)service along with all supporting components without any ticket for ops team • you can terminate any random instance in your infrastructure and the environment will self-heal • .. but let’s not even start with security related topics
  4. DON’TS How not to do “DevOps” • Post a job

    description for “DevOps Engineer” and hire a few • Put them on an “on-call” • Push away developers from directly interacting with the environment
  5. Effect? Apart from low velocity and quality you will get

    these: • “Hey, can you send me logs from my service?” • “Heey, can you purge Redis for me on staging?” • “Heeey, I clicked deploy on Jenkins and it’s stuck, HALP” • “Heeeeeeeeeeeeeey….”
  6. DO’S • Do enable developers • Streamline deployment process •

    Streamline infrastructure management • Guide, advise, discuss • Hide complexity, but not too much • Treat yourself as a service provider - deliver products not tickets
  7. It’s ok to hire devops engineer Brings experience and specialized

    focus • Communication skills are super important here • Tech requirements: good *nix skills, good google skills and sixth sense for sniffing bad practices • Probably the first person to handle Security in your new startup
  8. Starting point • Production environment: two servers, dozen microservices •

    Everything spinned up manually through AWS Console • Deployment meant ssh’ing to a server, downloading new docker image, stop+start (incurring downtime) • Monitoring? Just cloudwatch logs ❌ • Spring Boot + Spring Cloud (Netflix) • Dockerized, built on Jenkins • Configured via environment variables • Stateless • Use of AWS • Use of managed services • Most important thing: competent development team, eager to innovate ✅
  9. 1. Kubernetes Fixing error prone deployments • batteries-included approach •

    documentation ◦ courses, FAQs, examples • popular • reasonably sane ◦ apart from Milicores concepts and a few others ;-) • Lots of progress in the past ~2 years ◦ stable ◦ reliable ◦ lots of know-how ◦ lots of lessons learned ◦ powerful CLI
  10. Kubernetes cont’d • Helm • Spinnaker • Jenkins integrations •

    Operators for complex deployments • Monitoring stack • Cloud offerings (GKE, EKS, Azure) tons of tools on top of it Tons of tools on top of it
  11. YAAAML • 200-400 lines of YAML to describe a service..

    • Secrets management.. • Even with Helm, deployment is a complex command • Tains, tolerations, affinity, heap vs total memory, exposing ports, scraping metrics .. and keep it all consistent across multitude of services • Tooling versioning
  12. Re: hide complexity, but not too much • Jenkins deployment

    job is nice and all, up until it stops working • How can you expect proficiency with Kubernetes / kubectl if all developers ever do is push a Run button? • Enable them by making it easy to use CLI tools ◦ Prepare Helm, helm-secrets, helm-diff, all along with binaries, configs and ./setup.sh script for easy installation ◦ Create one template for all services, supporting most common configuration ◦ Add yet another abstraction layer for most common tasks
  13. DevOps == Collaboration • Example: monitor performance of all microservices

    ◦ Example stack: Prometheus via Prometheus Operator ◦ Add Service Monitor objects to each deployment • New application<->platform contract emerged: just expose prometheus metrics on port N and you will see your service graphs on Grafana ◦ Developers responsible for adjusting their services to obey the new contract, make domain specific dashboards • Good tools helped here: Kubernetes made it easy to deploy the stack, Spring framework made it easy to expose metrics
  14. 2. Infrastructure as Code Terraform + Atlantis • Git-versioned infrastructure

    • Migrate/Move or import existing resources • Setup Atlantis for audited and peer-reviewed infrastructure changes • Use the same tools to detect state drift (changes that were made outside of atlantis flow) • Optionally remove user permissions so that changes must go through Pull Requests
  15. Terraform Declarative infrastructure management • Define AWS resources ◦ Readable

    syntax ◦ Combine multiple resources into reusable module • Plan ◦ Compare definition with current state ◦ Display detailed changeset • Apply ◦ Make changes to infrastructure ◦ Record state • Team-workflow supported ◦ State in AWS S3 ◦ Locks in DynamoDB
  16. Atlantis Pull Requests for infrastructure 1. GitHub hook on each

    Pull Request to terraform repo 2. Additional layer of locking so no other PR can touch the same parts of infrastructure 3. Autoplan: show plan preview in PR comments 4. Review & Approve Pull Request 5. Apply changes 6. Remove locks and merge
  17. Demo? If time permits ;-) If time won’t permit: shout

    out to my friend Szymon W. who made a nice blogpost about introducing terraform and atlantis across whole company: https://lab.getbase.com/terraform-base/
  18. From my own experience Cosmose: • One “devops engineer”, seven

    contributors to terraform repo in a month, eleven now • > 10 production deployments per day • 3x more microservices since I joined (~6 months) • Infrastructure autoscaled 10x one time, when a dev wanted to “speed up his processing task” ;-) Base / Zendesk Sell: • Around 8 Ops and 42 (!) contributors to terraform repo • 30-50 deployments to prod daily • High level of ownership in dev teams, including expertise in running databases (e.g. ElasticSearch, MySQL), building their own infrastructure stacks (QA Kubernetes)