Slide 1

Slide 1 text

Growing Pains Cory Zue @czue

Slide 2

Slide 2 text

Who am I? 2007-2017 CTO, Dimagi Today Future “Chief Accelerator” Builder / Entrepreneur Freelance Consultant Lover of Emojis

Slide 3

Slide 3 text

Why this talk? Growing is hard. Especially from 1-100 people (or servers). It’s probably hard past 100 too, but I don’t know much about that.

Slide 4

Slide 4 text

Three Talks Parts Scaling Code Scaling your Stack Scaling Teams

Slide 5

Slide 5 text

Scaling Code: Fighting Complexity

Slide 6

Slide 6 text

A codebase

Slide 7

Slide 7 text

A codebase ● Interdependencies scale exponentially ● No longer fits in one person’s head

Slide 8

Slide 8 text

A codebase ● Takes longer to reason with ● Harder / slower to test things ● Longer to onboard new people

Slide 9

Slide 9 text

Enter: abstraction System Glue Subsystem Subsystem Subsystem ● Reduce the size of any single component to be able to fit in one person’s head ● Allows for specialization, if necessary

Slide 10

Slide 10 text

A more sustainable codebase

Slide 11

Slide 11 text

A more sustainable codebase

Slide 12

Slide 12 text

A more sustainable codebase

Slide 13

Slide 13 text

A more sustainable codebase

Slide 14

Slide 14 text

How to abstract? No structure “Organized” monolith Modules define public APIs via __init__.py Shared modules using git submodules Shared libraries using PyPI Service Oriented Architecture Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable

Slide 15

Slide 15 text

Rough Guidelines Execute for the short term (< 1 year) Plan for the long term (1-3 years) Design to be changed (always) Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable

Slide 16

Slide 16 text

All the things! A few other quick tips

Slide 17

Slide 17 text

Clean all the things! Rule: Always leave code cleaner than when you found it Flake CI Conduct refactorathons Resource tech debt

Slide 18

Slide 18 text

Review all the things! Code review is critical Conduct 10% design reviews for big features

Slide 19

Slide 19 text

(Automated) Test all the things! Test while coding Write failing tests before fixing bugs Build a CI pipeline

Slide 20

Slide 20 text

Growth and Infrastructure: Fighting Scale

Slide 21

Slide 21 text

Django Web Server Database Database Python App Code Web Server

Slide 22

Slide 22 text

Database Database / 3rd Party Django Database Cache The site is getting slow! Python App Code Web Server

Slide 23

Slide 23 text

Django Database Cache We need background processing! Background Processor Celery (background processor) Python App Code Web Server Database / 3rd Party

Slide 24

Slide 24 text

Celery Celery Django Django Django Database Cache The machines can’t keep up with the load! Background Processor Celery (background processor) Python App Code Web Server Database / 3rd Party

Slide 25

Slide 25 text

Celery Celery Django Django Django Database Cache The database can’t keep up with the load! Background Processor Celery (background processor) Database (Read Replica(s)) Python App Code Web Server Database / 3rd Party

Slide 26

Slide 26 text

Stream Processing Stream Processing Celery Celery Django Django Django Database Cache Our reports are taking way too long! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing Streaming Platform Python App Code Web Server Database / 3rd Party

Slide 27

Slide 27 text

Stream Processing Stream Processing Celery Celery Django Django Django Database Cache It’s the type of data in the database! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Streaming Platform Python App Code Web Server Database / 3rd Party

Slide 28

Slide 28 text

Stream Processing Stream Processing Celery Celery Django Django Django Databases Cache The data is just too big! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Database https://www.youtube.com/watch?v=EmyVoIi6W1c Streaming Platform Python App Code Web Server Database / 3rd Party

Slide 29

Slide 29 text

Rough Guidelines Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable Execute for the short term (< 1 year) Plan for the long term (1-3 years) Design to be changed (always)

Slide 30

Slide 30 text

All the things! A few other quick tips

Slide 31

Slide 31 text

Learn all the things! Build DevOps capacity Get to know your tools well. Really well.

Slide 32

Slide 32 text

Monitor all the things! Uptime → Pingdom, Status Cake, others Servers → New Relic, Datadog, Munin, others

Slide 33

Slide 33 text

Record all the things! Logs ELK (ElasticSearch, Logstash, Kibana) Datadog

Slide 34

Slide 34 text

Orchestrate all the things! Ansible (Salt, Fabric, Puppet/Chef)

Slide 35

Slide 35 text

Growth and Teams: Fighting Chaos

Slide 36

Slide 36 text

Two Common Types of Organic, Small Team Growth A single person (or small group of people) owns something. Common examples: - Code review - People management - Support / firefighting Superhero Model

Slide 37

Slide 37 text

Two Common Types of Organic, Small Team Growth The “herd” (aka everyone on the team) has distributed, self-organizing ownership Common examples: - Knowledge management - Bug fixes - Who does what Herd Model

Slide 38

Slide 38 text

How this works over time

Slide 39

Slide 39 text

Superheroes and Growth

Slide 40

Slide 40 text

Herds and Growth Source: https://twitter.com/RichRogersIoT/status/914577613300965377

Slide 41

Slide 41 text

Organic → Designed Organic Systems Happen unintentionally over a long period of time Exist because “that’s how it’s always worked” Can often be hard to explain / justify Designed Systems Are intentionally designed and revised over time Exist to fulfil a purpose / meet a need Can usually be explained / justified Trying to explain a system to a new hire is a good litmus test of which type of system it is

Slide 42

Slide 42 text

Designed Solutions Specialization Replace heroes with roles Tools Remove friction from unscalable systems / processes Processes Bring order to the chaos of the herd

Slide 43

Slide 43 text

People email us with issues The email goes to the whole tech/product team (herd) Whoever looks first (usually superhero Alice) responds If it’s a bug they assign it to the person responsible for the bug (herd knowledge) Example: Organic Support

Slide 44

Slide 44 text

Example: Designed Support Support tickets go into a queue in our support system (tool) that is triaged by our support team (specialization) The team triages and follows a defined escalation and response procedure (process) If the ticket is a bug it is put in a different queue (tool/process) and is picked up according to priority by one of our rotating support engineers (specialization / process)

Slide 45

Slide 45 text

Make a pull request and whoever (herd) gets to it (usually superhero Bob) will review and merge or make comments If no one reviews it after a few days you can start pinging people (herd) on slack Example: Organic Code Review

Slide 46

Slide 46 text

Example: Designed Code Review Every developer will be assigned a code buddy who is responsible for reviewing your pull requests (process) Additionally, you should ping the defined area owner of that part of the codebase (specialization) for a secondary review. Pull requests should only be merged after reviews have been approved (process) and all tests pass (tool)

Slide 47

Slide 47 text

Much like other things, your culture will also be determined organically based on your founding team and leadership As you scale you want to shift to a more designed culture as well, else you will maintain the good and bad qualities Culture will change, so your job is to let it change in a positive way Culture: Organic → Designed

Slide 48

Slide 48 text

Example: Organic Culture We hire people who are like us in background / perspective / race / gender / etc. because that’s what’s worked in the past We all socialize together outside of work because we always have We critique each other’s work openly, harshly, and regularly because that’s what the CEO and CTO did

Slide 49

Slide 49 text

Example: Designed Culture We aspire to be a diverse organization and so we work hard to remove bias from our interview process and attract candidates from all walks of life What happens at work is what’s important, what you choose to do outside of work is up to you We work hard to have supportive conversations that address the real problems without hurting people

Slide 50

Slide 50 text

Rough Guidelines Execute for the short term Plan for the long term Design to be changed Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable

Slide 51

Slide 51 text

Questions? www.coryzue.com, @czue www.dimagi.com (we’re hiring!)