Slide 1

Slide 1 text

Containerizing your monolith Jano González (@janogonzalez)

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

What’s SoundCloud

Slide 4

Slide 4 text

> 200M Tracks > 20M Creators Many SoundCloud Rappers What’s SoundCloud

Slide 5

Slide 5 text

2017-2018 Migration Keep moving Motivation

Slide 6

Slide 6 text

What’s the balance? Motivation

Slide 7

Slide 7 text

Our migration to microservices The monolith Containerizing the monolith Conclusion Contents

Slide 8

Slide 8 text

Our migration to microservices

Slide 9

Slide 9 text

$ rails new soundcloud How we started

Slide 10

Slide 10 text

Around 2012

Slide 11

Slide 11 text

Around 2012

Slide 12

Slide 12 text

Around 2012

Slide 13

Slide 13 text

Around 2012

Slide 14

Slide 14 text

After

Slide 15

Slide 15 text

After

Slide 16

Slide 16 text

But what about deployment?

Slide 17

Slide 17 text

Component Environment Zone Deployment Our abstractions

Slide 18

Slide 18 text

Deployment The process

Slide 19

Slide 19 text

The monolith

Slide 20

Slide 20 text

Track User Playlist The monolith Core entities

Slide 21

Slide 21 text

360 Chef provisioned bare-metal machines Rails 2.3 Capistrano deployment The monolith The technology

Slide 22

Slide 22 text

The monolith The architecture Public API Internal API Public API Strangler Internal API caching/ strangler Another BFF Many microservices Public Web Workers

Slide 23

Slide 23 text

The monolith The components Public API MoshiMoshi (Internal API) Public Web Assets MoshiMoshi Comments (Internal API) Workers Cron Shell Migration

Slide 24

Slide 24 text

The monolith The hosts component statsd-exporter mtail statsd passenger- exporter …

Slide 25

Slide 25 text

Utilization Deployment Lack of confidence The monolith The issues

Slide 26

Slide 26 text

Containerizing the monolith

Slide 27

Slide 27 text

Throw it into Kubernetes

Slide 28

Slide 28 text

Congrats, your monolith is a microservice now

Slide 29

Slide 29 text

Thank You!

Slide 30

Slide 30 text

1.5 Engineers 1 year until the last bit was cleaned up The project

Slide 31

Slide 31 text

The first milestone

Slide 32

Slide 32 text

Docker development container Tests on GoCD The first milestone

Slide 33

Slide 33 text

The proof of concept

Slide 34

Slide 34 text

First staging component The proof of concept Does it even work?

Slide 35

Slide 35 text

The proof of concept PR O BLEM ! Init script Nginx Passenger Passenger Process Rails App ??? Where are my env variables?

Slide 36

Slide 36 text

The proof of concept Env variables with The Perl Hack™ SO LU TIO N

Slide 37

Slide 37 text

Productionizing

Slide 38

Slide 38 text

Deployment Monitoring Logs Productionizing Does it run where it matters?

Slide 39

Slide 39 text

Productionizing Anatomy of a traffic serving pod component statsd statsd-exporter passenger- exporter mtail twemproxy twemproxy-cu twemproxy- exporter init

Slide 40

Slide 40 text

Don’t choke service discovery Be allocatable Productionizing Sizing the pods

Slide 41

Slide 41 text

CPU units for main container Passenger processes Productionizing Sizing the pods 3 16

Slide 42

Slide 42 text

Productionizing PR O BLEM ! Stdout v/s the log metrics exporter Component STDOUT Log aggregator File??? Mtail

Slide 43

Slide 43 text

Productionizing Mtail with The Rotatelogs Hack™ SO LU TIO N

Slide 44

Slide 44 text

Productionizing Public API

Slide 45

Slide 45 text

Orchestration Productionizing Public API

Slide 46

Slide 46 text

Productionizing Public API PR O BLEM ! DNS latency and our excessive usage Component Service Service Service

Slide 47

Slide 47 text

Productionizing Public API CoreDNS and the DNS Hack™ SO LU TIO N

Slide 48

Slide 48 text

Productionizing Internal API

Slide 49

Slide 49 text

Highest throughput Productionizing Internal API

Slide 50

Slide 50 text

Productionizing Internal API PR O BLEM ! Latency was too high

Slide 51

Slide 51 text

Productionizing Internal API Optimize GC and make cheaper SQL queries SO LU TIO N

Slide 52

Slide 52 text

Productionizing Internal API PR O BLEM ! Errors spikes during deployment

Slide 53

Slide 53 text

Productionizing Internal API The preStop Trick™ SO LU TIO N

Slide 54

Slide 54 text

Productionizing Internal API PR O BLEM ! Errors spikes during deployment (still???)

Slide 55

Slide 55 text

Productionizing Internal API The Pre Start Trick™ SO LU TIO N

Slide 56

Slide 56 text

The rest

Slide 57

Slide 57 text

● Workers ● Cron jobs ● Shell / Migration hosts ● Cleanup! The rest

Slide 58

Slide 58 text

Finishing

Slide 59

Slide 59 text

Current status

Slide 60

Slide 60 text

On-prem Cloud Current status Number of pods ~1000 ~140

Slide 61

Slide 61 text

On-prem RPS Cloud RPS Current status Traffic 25K 3K

Slide 62

Slide 62 text

Current status Many deploys

Slide 63

Slide 63 text

Conclusions

Slide 64

Slide 64 text

One Infrastructure One Delivery Process Conclusions What we solved

Slide 65

Slide 65 text

Step by step Controlled rollouts Managing expectations Conclusions How we did it

Slide 66

Slide 66 text

Improved utilization Increased confidence Enabling new initiatives Conclusions Benefits

Slide 67

Slide 67 text

Assess current progress Evaluate costs and benefits Conclusions Should you do it?

Slide 68

Slide 68 text

Thank You!

Slide 69

Slide 69 text

@janogonzalez https://soundcloud.com/janogonzalez