Migrating a monolith to Kubernetes

Migrating a monolith to Kubernetes DevOps Enterprise Summit 2017 Jesse
Newland

I’m Jesse Newland

@jnewland

16 years in web operations

6 years at GitHub

Engineering / Management

Technical leadership from Austin, TX

Why am I here? Kubernetes? Monoliths? DevOps? ENTERPRISE?

My job is to affect change in a technical organization

Github is growing, maturing, & evolving

Our solutions often don’t scale to fit the needs of
our growing the organization

On a journey of continuous improvement

We are more alike, my friends, than we are unalike.
Maya Angelou

https://githubengineering.com/kubernetes-at-github/

Kubernetes is an open- source system for automating deployment, scaling,
and management of containerized applications

Kubernetes builds upon 15 years of experience of running production
workloads at Google, combined with best-of-breed ideas and practices from the community

I’m not here to tell you that you should adopt
Kubernetes

Or even to go to deep into the technical details
of our migration

https://githubengineering.com/kubernetes-at-github/ @jnewland

Kubernetes is a technology

Kubernetes is a super dope technology

Not a panacea

Use what’s right for you

I’d like to share an anecdote from our ongoing journey

The only slide with bullets, I promise! • Why we
migrated our monolith to Kubernetes • How did we approached a large cross-team project • Where we are today • What we learned in the process • Where we’re headed

Context

The monolith

Ruby on Rails

github.com/ github/ github

GitHub dot com the website

10 years old

Extremely important to early velocity

Increasing complexity

Diffusion of responsibility

Incredibly high performance hardware

Incredibly reliable hardware

Incredibly low latency networking

Incredibly high throughput networking

Unit of compute == instance

Instance setup tightly coupled with configuration management

API-driven, testable, but brutal feedback loop

Human-managed provisioning and load balancing config

High level of effort required to get a service into
production

Our customer base is growing

Our customers are growing

Our ecosystem is growing

Our organization is growing

We’re shipping new products

We’re improving existing products

Our customers expect increasing speed and reliability

We saw indications that our approach was struggling to deal
with these forces

The engineering culture at GitHub was attempting to evolve to
encourage individual teams to act as maintainers of their own services

SRE's tools and practices for running services had not yet
evolved to match

Easier to add functionality to an existing service

Unsurprisingly, the monolith kept growing

Increasing CI duration

Increasing deploy duration

Inflexible infrastructure

Inefficient infrastructure

Private cloud lock-in

Developer and user experience trending downward

The planets aligned in way that made all of these
problems visible all at one

Hack week

Given a week to ship something new and innovative, what
might we expect engineers to do?

1) spend ~1 day on Puppet, provisioning, and load balancing
config

2) reach out to SRE on Thursday and ask for
our help?

3) build hack week features as a PR against the
monolith

Microcosm of the larger problems with our approach

Incentives not aligned with the outcomes we desired

Our on-ramp went in the wrong direction

High effort required

We decided to make an investment in our tools

We decided to make an investment in our processes

We decided to make an investment in our technology

To support the other ongoing changes in our organization, we
decided that we would work to level the playing field

To support the decomposition of the monolith, we decided that
we would work to provide a better experience for new services

To enable SRE to spend more time on interesting services,
we decided to work to reduce the amount of time we needed to spend on boring services

To reduce the time we spent on boring services, we
decided to work to make the service provisioning process entirely self-service

To bring the infrastructure- building feedback loop down, we decided
to base this new future on a container orchestration platform

To leverage the experience of Google and the strength of
the community, we decided to build this new approach with Kubernetes

okay sorry, a few more bullets • Passion team •
Prototype • Pick an impactful and visible target • Product vision and project plan • Pwork • Pause and regroup

Passion team

https://github.com/blog/2316-organize-your-experts-with-ad-hoc-teams

Intentionally curate a diverse set of skills

Intentionally curate a diverse set of experience

Intentionally curate a diverse set of knowledge

Intentionally curate a diverse set of perspectives

SRE + Developer Experience + Platform Engineering

Project scoped team

@github/kubernetes github/kube #kube

Prototype

A strategy for not crying under the bed during hack
week

Prototype Goals

Kubernetes cluster, load balancing, deployment strategy, docs

Leverage hack week standard of quality

Validate our hypothesis that we could provide a new and
better experience with minimal effort

Validate our hypothesis that if provided with another option, engineers
would flock to it

Learn more about Kubernetes

Seek feedback from engineers that used the new approach

Internal marketing

Wild success

Handful of projects launched with very little SRE involvement

Positive feedback

Learned a ton about an engineer’s perspective

Several of these projects still exist, and are maintained by
their creating teams

Pick a big target

We decided to migrate the monolith

We wanted to validate something larger following our positive experience
with smaller scale apps during Hack Week

A well worn path

We were confident in the testing strategies available to us

We had an overlapping need for dynamic lab environments

And an overlapping need for more flexibility to handle peaks
and valleys of demand

It might not work

We might make things worse

We decided to put together a project plan and see
if it felt viable

Vision and plan

Tons of high-impact, visible work ahead

Communication was crucial

Key elements of communicating change at GitHub

Know your goal

…and lead with it

Don’t mince words

Write conversationally

Include the alternatives you’ve considered

Doing nothing is always an alternative

Consider the production impact

Give it a URL

Pull request

Repeat the message using different mediums

Communication had the desired impact

Executive support

Additional engineering resources

Project management resources

Now all we had to do was not be wrong

How’d it go?

One big container

1.1gb image

100s image build

it's fine

Review lab

50 times per day

Staﬀ opt-in

Controlled experiments

~100% of github.com web requests served by application processes running
Kubernetes

Most of the functionality we built to support the monolith
is available to other services

~20% of all services are running on Kubernetes clusters

What'd we learn?

Positive outcomes

Reduced level of effort for new service setup

New services regularly deployed with little-to- no SRE involvement

APIs to query the running state of our system

APIs to mutate the running state of our system

Cloud-native platform to build against

Emerging as a standard

Reduce lock-in

Commoditize compute providers

More OSS friendly than configuration management and glue

provider automation, config management, packages, & operating system

container images, resources, & apis

Challenges

Operationalizing a new platform

Docker instability

Changing the expectations of application engineers

Application Engineering

Change

Learning curve

Shorter process lifetimes

What happens on process shutdown?

What happens during ungraceful shutdown?

Things I’d do again

Passion team

Prioritize communication

Network effect via highly visible work

Gradual rollout

Things I’d do differently

More consciously consider the handoff phase

Document this approach to help it feel more regular

More open source

What’s next?

Seek feedback from engineers

Seek feedback from SREs

Seek feedback from leadership

Relentlessly focus on automating work that scales with traffic or
organizational size

Build services that leverage the platform

Focus SRE efforts on improvements that benefit all services

Focus SRE efforts on improvements that benefit everyone

Keep improving

Thanks!

@jnewland

Migrating a monolith to Kubernetes

Migrating a monolith to Kubernetes

More Decks by Jesse Newland

Other Decks in Technology

Featured

Transcript