Around & After Kubernetes: The Principles and Ideas that Guide Us

Around & After Kubernetes The Principles and Ideas that guide
us DevOps Days Cape Town 2019 King’ori Maina “King”

Meet The Team Together, we make the Infrastructure/DevOps/DevSecOps team ...
Hadrian Valentine Infrastructure Engineer [email protected] @hadrianvale King’ori Maina Infrastructure Engineer [email protected] @itskingori Hati Chindove Head of Information Security [email protected] @hatitye Zac Blazic Infrastructure Engineer [email protected] @zacblazic Head of In-Security (glorified task manager) Product Owner (paid to worry)

Provide insights to allow global brands make better decisions. What
clients pay us for. Predict effectiveness. Monitor performance. Validate ideas. Creating charts. Then creating more. Not a typo. Lots of integration with awesome tools to make developer’s lives easier. Open-Sauce Internal stuff that’s not available in any off the shelf tool ... at least in one place. Our Workflow. We use a bunch of technology from people way smarter than us. Infrastructure. Business Value Port Control What We Do Supporting Services @itskingori

Difficult Is Now Easy Easier It All Started In 2015
... @itskingori

Throw away any stateless component when we want. Immutability Observability
Extensibility Reviewability Scalability Our Goals To know where we want to go, we needed some long term objectives … Build upon what we have when we want. Represent everything in source code when we want. Debug it when we want i.e. logging + metrics. Add capacity when we want … ideally, automatically. Because … immutability is a requirements for scaling up and scaling down. Because … we want to stand on the shoulders of giants. No need to re-invent the wheel. Because … we want to be able to commit changes into source-code and have a single source of truth. Because … it’s not a matter of if things will go wrong but when. Because … we want to be able to handle the thundering herd without always running at capacity. @itskingori

Approach • We accepted the potential guaranteed risks. • We
committed to set up any new services in Docker going forward. • We, in hindsight, did not know what we were setting ourselves up for! Impact • We now upgrade dependencies in isolation reducing blast radius. • We now have repeatable development, build, test and production environments. • We now limit resources per application based on requirements. • We spin up new servers in ~3 minutes. Challenge We had a server provisioning problem. We had a packaging problem. We had a process isolation problem. Our Journey on Docker @itskingori

Approach • We embraced potential guaranteed complexity but resolved to
keep it at a minimum (opportunity-cost). • We committed to migrating existing services to our new infrastructure one-by one. • We rebuilt tooling step-by-step. Impact • We do not regret our decision. • We have an API to hard-problems regarding infrastructure . • We spend more of our time on developer enablement than infrastructure problems. • We sleep better (declarative configuration, self-healing). Challenge We had an orchestration problem. We had a serious peer-pressure problem. We had a hard- problem, problem. Our Journey on Kubernetes @itskingori

Approach • We invested in building a tool from scratch,
for us … by us. • We build it with a one year horizon (more if possible). • We prioritize extensibility to cover unknown use-cases. Impact • We do not wait for or hack tools to work how we work. • We have an API for our internal-workflows. • We can on-board a new developer in less than 5 minutes (self-serve, immediately informed + productive). • 50k deployments since April 2017, 3.9k last month. Challenge We had an internal workflow problem problem. We had a retrofitting problem. We had a one “ring” to rule them all desire. Our Journey on Port Control @itskingori

In Retrospect What is it that we’re looking forward to?
Then we can be more intentional at building for the future by laying the right foundation as we go. What is that we’ve done right? So that we can keep doing them and guard against complacency. What is it that we could have done better? Then we can focus on those areas and see what more potential we can unlock. It’s all a narrative fallacy. @itskingori

Zappi Confidential & Proprietary Information Reduce Cognitive Load 1. We
want to exploit all of the advantages that come from having a small number of well- known tools. When you have a small number of well-known tools, you can then focus on the product. — John Allspaw, Former Etsy CTO @allspaw @itskingori

Halt The Proliferation of Tools We’re living in amazing over-whelming
times ... @itskingori … can we go back to LAMP stacks?

Zappi Confidential & Proprietary Information ... of course, all of
this has to be underpinned by … the system is stable and performant Keep The Main Thing, The Main Thing We don’t want to be doing engineering for engineering’s sake … @itskingori Optimise pushing code to production Simplify processes so that self-service unblocks most people Make deployments robust and atomic Because … if people are confident about the deploy process they will deploy more! Because … we want less work for ourselves so that we can focus on features not crisis! Because … deployments are a unit of work and a representation of business value going out!

Post-mortem debriefings every day are littered with the artefacts of
people insisting, the second before an outage, that “I don’t have to care about that. — John Allspaw Former Etsy CTO @allspaw The Cost of Abstractions Realities • Knowledge of Kubernetes is not an operational requirement for a developer. • Not all developers care about infrastructure. • Not all developers can care (context switching is expensive). • The right abstractions can have a multiplier effect on developer efficiency (consistency & predictability e.g. labels). @itskingori

Insert text bla blaov saov;ih sdbv awsvn;deor vbla blaov .jbd
sn z;i h awsvn;deor vbla blaov saov;ih awsvn;deor vbla blaov saov;ih awsvn;deor v Getting Out of the Way 2. It doesn’t make sense to hire smart people and tell them what to do; we hire smart people so they can tell us what to do. — Steve Jobs, Former Apple CEO

Automate As Much As You Can Need Empowerment Tomorrow Developer
needs to figure out a way to do task-X DevOps team provides a tool to do task-X (albeit manually) DevOps team teaches the system to do task-X (automagically) once / month @itskingori multiple times / week multiple times / day $ portctl redeploy team --team=supa-team \ --exclude-app=someapp-1 --exclude-app=someapp-2 \ --refresh

Need Empowerment Tomorrow Developer needs to figure out a way
to do task-X DevOps team provides a tool to do task-X (albeit manually) DevOps team teaches the system to do task-X (automagically) once / month @itskingori multiple times / week multiple times / day Delegate Responsibility Via Tooling $ portctl backup full --application=reports \ --environment=production $ portctl restore full --application=reports \ --environment=sandbox --team=supa-team --backup-id=123

Zappi Confidential & Proprietary Information Shared Ownership & Responsibility 3.
Engineering, as a discipline and as an activity, is multi- disciplinary. It’s just messy. And that’s actually the best part of engineering. It’s not about everyone knowing everything. It’s about paying attention to the shared, mutual understanding. — John Allspaw, Former Etsy CTO @allspaw @itskingori

Proactive Education @itskingori Approach • We encourage questions and invest
in detailed explanations. • We train on tooling where it’s not obvious e.g. Kibana (for logs) and Grafana (for metrics). • We view being viewed as wizards as proof of our failure to educate. • We haven’t done a good job at high-level write-ups (documentation is code, for now). As an engineer who starts day one, I am [not] the best one to know how network protocols at Etsy work, and I’m going to be encouraged to seek out the experts in those domains until I do. And maybe something will break, and then I’m going to learn something new. — John Allspaw Former Etsy CTO @allspaw

Open Participation @itskingori Approach • We don’t own infrastructure, we
just guide its vision & evolution. • We view our relationship with developers as a partnership. • We encourage developers to design their underlying systems (doors are open for consultation). • We do not dictate what we run i.e. versions, programming languages etc. • Everyone has access to our infrastructure (as code) … except secrets (work-in-progress). • Everyone can participate in infrastructure i.e. send pull-requests.

Insert text bla blaov saov;ih sdbv awsvn;deor vbla blaov .jbd
sn z;i h awsvn;deor vbla blaov saov;ih awsvn;deor vbla blaov saov;ih awsvn;deor v Security is an Endless Journey 4. When you decide to take on the [chief security officer] title, you decide that you’re going to run the risk of having decisions made above you or issues created by tens of thousands of people making decisions that will be stapled to your resume — Alex Stamos, Former Facebook CSO @alexstamos

Security Is A Team Effort @itskingori We want to develop
generative cultures, where risk is shared. It’s everyone’s concern. If you build security responsibility into every team, you can scale much more powerfully than if security is only the security staff’s responsibility. — Dai Zovi Cash App CTO at Square @dinodaizovi Approach • We generally have a high trust environment. • We have trust scopes (vary degrees of trust). • We have audit logs. • We have a penguin team with 37 volunteers (43%).

Security Is Not A Destination @itskingori Realities • It’s involving
and continuously evolving work. • We haven’t figured everything out (some security measures aren’t pragmatic). • Fundamentally, we want to avoid the front- page news. What Works For Us • We use SSO everywhere. • We pen-test as often as we can. • We automate user management; provisioning & revocation.

Zappi Confidential & Proprietary Information The way a team plays
as a whole determines its success. — Babe Ruth, Baseball Player Work Processes That Work For Us 5. @itskingori

Empathy Underlies Our Processes Infrastructure as code: We use terraform
to plan and apply infrastructure changes which are reviewed in pull requests (trust but verify) Feedback Loops: We view port-control as a product and developers as our clients … listen, fix, listen, improve, listen, adapt Document everything: We memorialize what’s not code in Slack, Google Docs, wikis for posterity (if you’re not there can someone else do it without you?) Proactive Support: We view ourselves as guides, not enforcers. Always having the bird’s eye view and jumping in to address an issue before it’s raised Dog-fooding: We use port-control to deploy port-control (api/dashboard) and release portctl (cli) @itskingori

Where Do We Go From Here? @itskingori

Measure The Four Golden Signals (Better) Implement More White-Box Monitoring
Improve Alerting Latency, traffic, errors and saturation are becoming increasingly important to track how well we’re doing. Avoid setting up alerts only as a reaction to a failure. Codify alerting. Get a closer look into our applications and supporting services (not just your standard system metrics). Stuff We Need To Improve On @itskingori

In The Next Year What tools can we use to
debug network calls across microservices? How can we simplify local development in a micro-services world? What can we do to democratize the management of secrets? How can we implement different deployment strategies? Can we use machine learning to auto- suggest resolutions to developer issues? ??? Service meshes? Tracing? Training a model? Vault + Port Control?

In Summary ... • Invest in your own internal-workflow tools.
High initial cost, but returns are worth it. • Keep the main thing, the main thing. • Use empathy as your key driver and you’ll never go wrong. • Automate, automate, automate. Delegate, delegate, delegate. • Scale yourself through empowerment. • Security is like a long road-trip with friends with no end. • Figure out what works for you and get started. It’s a long road ahead, don’t get overwhelmed ... take a step at a time. • It’s never been a better time than now to rethink your infrastructure. @itskingori

Thank You! That’s how we Dev + Sec + Ops
@ @kingori @itskingori

Around & After Kubernetes: The Principles and I...

Around & After Kubernetes: The Principles and Ideas that Guide Us

King'ori Maina

More Decks by King'ori Maina

Other Decks in Technology

Featured

Transcript

Around & After Kubernetes The Principles and Ideas that guide

Meet The Team Together, we make the Infrastructure/DevOps/DevSecOps team ...

Provide insights to allow global brands make better decisions. What

Difficult Is Now Easy Easier It All Started In 2015

Throw away any stateless component when we want. Immutability Observability

Approach • We accepted the potential guaranteed risks. • We

Approach • We embraced potential guaranteed complexity but resolved to

Approach • We invested in building a tool from scratch,

In Retrospect What is it that we’re looking forward to?

Zappi Confidential & Proprietary Information Reduce Cognitive Load 1. We

Halt The Proliferation of Tools We’re living in amazing over-whelming

Zappi Confidential & Proprietary Information ... of course, all of

Post-mortem debriefings every day are littered with the artefacts of

Insert text bla blaov saov;ih sdbv awsvn;deor vbla blaov .jbd

Automate As Much As You Can Need Empowerment Tomorrow Developer

Need Empowerment Tomorrow Developer needs to figure out a way

Zappi Confidential & Proprietary Information Shared Ownership & Responsibility 3.

Proactive Education @itskingori Approach • We encourage questions and invest

Open Participation @itskingori Approach • We don’t own infrastructure, we

Insert text bla blaov saov;ih sdbv awsvn;deor vbla blaov .jbd

Security Is A Team Effort @itskingori We want to develop

Security Is Not A Destination @itskingori Realities • It’s involving

Zappi Confidential & Proprietary Information The way a team plays

Empathy Underlies Our Processes Infrastructure as code: We use terraform

Where Do We Go From Here? @itskingori

Measure The Four Golden Signals (Better) Implement More White-Box Monitoring

In The Next Year What tools can we use to

In Summary ... • Invest in your own internal-workflow tools.

Thank You! That’s how we Dev + Sec + Ops