SUCCESSFUL DEVOPS IMPLEMENTATION FOR SMALL TEAMS: A TRUE STORY

What is DevOps?

Definition “A set of practices intended to reduce the time
between committing a change to a system and the change being placed into normal production, while ensuring high quality” Bass, Len; Weber, Ingo; Zhu, Liming. DevOps: A Software Architect's Perspective. ISBN 978-0134049847.

It’s not a new thing already established in the industry
- tons of job offerings confirm that • automation, automation, automation • containers eeeeverywheeereee • The Cloud (i.e. someone else’s computer)

Simple test: “are you DevOps level over 9000?” • your
answer for “how many servers do you have?” is “I have to check..” • you do multiple production deployments each day • your dev team can create new (micro)service along with all supporting components without any ticket for ops team • you can terminate any random instance in your infrastructure and the environment will self-heal • .. but let’s not even start with security related topics

DON’TS How not to do “DevOps” • Post a job
description for “DevOps Engineer” and hire a few • Put them on an “on-call” • Push away developers from directly interacting with the environment

Effect? Apart from low velocity and quality you will get
these: • “Hey, can you send me logs from my service?” • “Heey, can you purge Redis for me on staging?” • “Heeey, I clicked deploy on Jenkins and it’s stuck, HALP” • “Heeeeeeeeeeeeeey….”

Let’s try another approach

DO’S • Do enable developers • Streamline deployment process •
Streamline infrastructure management • Guide, advise, discuss • Hide complexity, but not too much • Treat yourself as a service provider - deliver products not tickets

BUT HOW ?!

It’s ok to hire devops engineer Brings experience and specialized
focus • Communication skills are super important here • Tech requirements: good *nix skills, good google skills and sixth sense for sniffing bad practices • Probably the first person to handle Security in your new startup

Starting point • Production environment: two servers, dozen microservices •
Everything spinned up manually through AWS Console • Deployment meant ssh’ing to a server, downloading new docker image, stop+start (incurring downtime) • Monitoring? Just cloudwatch logs ❌ • Spring Boot + Spring Cloud (Netflix) • Dockerized, built on Jenkins • Configured via environment variables • Stateless • Use of AWS • Use of managed services • Most important thing: competent development team, eager to innovate ✅

1. Kubernetes Fixing error prone deployments • batteries-included approach •
documentation ◦ courses, FAQs, examples • popular • reasonably sane ◦ apart from Milicores concepts and a few others ;-) • Lots of progress in the past ~2 years ◦ stable ◦ reliable ◦ lots of know-how ◦ lots of lessons learned ◦ powerful CLI

Kubernetes cont’d • Helm • Spinnaker • Jenkins integrations •
Operators for complex deployments • Monitoring stack • Cloud offerings (GKE, EKS, Azure) tons of tools on top of it Tons of tools on top of it

Still, not a silver bullet

YAAAML • 200-400 lines of YAML to describe a service..
• Secrets management.. • Even with Helm, deployment is a complex command • Tains, tolerations, affinity, heap vs total memory, exposing ports, scraping metrics .. and keep it all consistent across multitude of services • Tooling versioning

Re: hide complexity, but not too much • Jenkins deployment
job is nice and all, up until it stops working • How can you expect proficiency with Kubernetes / kubectl if all developers ever do is push a Run button? • Enable them by making it easy to use CLI tools ◦ Prepare Helm, helm-secrets, helm-diff, all along with binaries, configs and ./setup.sh script for easy installation ◦ Create one template for all services, supporting most common configuration ◦ Add yet another abstraction layer for most common tasks

Demo: qp ~200 lines of BASH script as an abstraction
layer on top of Helm •

after before

DevOps == Collaboration • Example: monitor performance of all microservices
◦ Example stack: Prometheus via Prometheus Operator ◦ Add Service Monitor objects to each deployment • New application<->platform contract emerged: just expose prometheus metrics on port N and you will see your service graphs on Grafana ◦ Developers responsible for adjusting their services to obey the new contract, make domain specific dashboards • Good tools helped here: Kubernetes made it easy to deploy the stack, Spring framework made it easy to expose metrics

2. Infrastructure as Code Terraform + Atlantis • Git-versioned infrastructure
• Migrate/Move or import existing resources • Setup Atlantis for audited and peer-reviewed infrastructure changes • Use the same tools to detect state drift (changes that were made outside of atlantis flow) • Optionally remove user permissions so that changes must go through Pull Requests

Terraform Declarative infrastructure management • Define AWS resources ◦ Readable
syntax ◦ Combine multiple resources into reusable module • Plan ◦ Compare definition with current state ◦ Display detailed changeset • Apply ◦ Make changes to infrastructure ◦ Record state • Team-workflow supported ◦ State in AWS S3 ◦ Locks in DynamoDB

Atlantis Pull Requests for infrastructure 1. GitHub hook on each
Pull Request to terraform repo 2. Additional layer of locking so no other PR can touch the same parts of infrastructure 3. Autoplan: show plan preview in PR comments 4. Review & Approve Pull Request 5. Apply changes 6. Remove locks and merge

Demo? If time permits ;-) If time won’t permit: shout
out to my friend Szymon W. who made a nice blogpost about introducing terraform and atlantis across whole company: https://lab.getbase.com/terraform-base/

Does it work?

From my own experience Cosmose: • One “devops engineer”, seven
contributors to terraform repo in a month, eleven now • > 10 production deployments per day • 3x more microservices since I joined (~6 months) • Infrastructure autoscaled 10x one time, when a dev wanted to “speed up his processing task” ;-) Base / Zendesk Sell: • Around 8 Ops and 42 (!) contributors to terraform repo • 30-50 deployments to prod daily • High level of ownership in dev teams, including expertise in running databases (e.g. ElasticSearch, MySQL), building their own infrastructure stacks (QA Kubernetes)

It works!

Thanks! Jakub P. Głazik [email protected] github.com/zytek Questions are more than
welcome

SUCCESSFUL DEVOPS IMPLEMENTATION FOR SMALL TEAM...

SUCCESSFUL DEVOPS IMPLEMENTATION FOR SMALL TEAMS: A TRUE STORY

Jakub Paweł Głazik

Other Decks in Programming

Featured

Transcript