Who am I?
2007-2017
CTO, Dimagi
Today
Future “Chief Accelerator”
Builder / Entrepreneur
Freelance Consultant
Lover of Emojis
Slide 3
Slide 3 text
Why this talk?
Growing is hard.
Especially from 1-100
people (or servers).
It’s probably hard past
100 too, but I don’t know
much about that.
Slide 4
Slide 4 text
Three Talks Parts
Scaling Code
Scaling your Stack
Scaling Teams
Slide 5
Slide 5 text
Scaling Code: Fighting Complexity
Slide 6
Slide 6 text
A codebase
Slide 7
Slide 7 text
A codebase ● Interdependencies scale exponentially
● No longer fits in one person’s head
Slide 8
Slide 8 text
A codebase ● Takes longer to reason with
● Harder / slower to test things
● Longer to onboard new people
Slide 9
Slide 9 text
Enter: abstraction
System Glue
Subsystem
Subsystem
Subsystem
● Reduce the size of any single component to
be able to fit in one person’s head
● Allows for specialization, if necessary
Slide 10
Slide 10 text
A more sustainable codebase
Slide 11
Slide 11 text
A more sustainable codebase
Slide 12
Slide 12 text
A more sustainable codebase
Slide 13
Slide 13 text
A more sustainable codebase
Slide 14
Slide 14 text
How to abstract?
No structure
“Organized” monolith
Modules define public APIs via
__init__.py
Shared modules using git submodules
Shared libraries using PyPI
Service Oriented Architecture
Most Short
Term Work
Most Long
Term Scalable
Least Short
Term Work
Least Long
Term Scalable
Slide 15
Slide 15 text
Rough Guidelines
Execute for the short term (< 1 year)
Plan for the long term (1-3 years)
Design to be changed (always)
Most Short
Term Work
Most Long
Term Scalable
Least Short
Term Work
Least Long
Term Scalable
Slide 16
Slide 16 text
All the things!
A few other quick tips
Slide 17
Slide 17 text
Clean all the things!
Rule: Always leave code cleaner than when you found it
Flake CI
Conduct refactorathons
Resource tech debt
Slide 18
Slide 18 text
Review all the things!
Code review is critical
Conduct 10% design reviews for big features
Slide 19
Slide 19 text
(Automated) Test all the things!
Test while coding
Write failing tests before fixing bugs
Build a CI pipeline
Slide 20
Slide 20 text
Growth and Infrastructure: Fighting Scale
Slide 21
Slide 21 text
Django
Web Server
Database
Database
Python App
Code
Web
Server
Slide 22
Slide 22 text
Database
Database /
3rd Party
Django
Database Cache
The site is getting slow!
Python App
Code
Web
Server
Slide 23
Slide 23 text
Django
Database Cache
We need background
processing!
Background
Processor
Celery
(background processor)
Python App
Code
Web
Server
Database /
3rd Party
Slide 24
Slide 24 text
Celery
Celery
Django
Django
Django
Database Cache
The machines can’t
keep up with the load!
Background
Processor
Celery
(background processor)
Python App
Code
Web
Server
Database /
3rd Party
Slide 25
Slide 25 text
Celery
Celery
Django
Django
Django
Database Cache
The database can’t
keep up with the load!
Background
Processor
Celery
(background processor)
Database
(Read Replica(s))
Python App
Code
Web
Server
Database /
3rd Party
Slide 26
Slide 26 text
Stream Processing
Stream Processing Celery
Celery
Django
Django
Django
Database Cache
Our reports are taking
way too long!
Background
Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
Streaming Platform
Python App
Code
Web
Server
Database /
3rd Party
Slide 27
Slide 27 text
Stream Processing
Stream Processing Celery
Celery
Django
Django
Django
Database Cache
It’s the type of data in
the database!
Background
Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
“Blob” Database
Streaming Platform
Python App
Code
Web
Server
Database /
3rd Party
Slide 28
Slide 28 text
Stream Processing
Stream Processing Celery
Celery
Django
Django
Django
Databases Cache
The data is just too big!
Background
Processor
Celery
(background processor)
Database
(Read Replica(s))
Analytics Database
Stream Processing
“Blob” Database
Database
https://www.youtube.com/watch?v=EmyVoIi6W1c
Streaming Platform
Python App
Code
Web
Server
Database /
3rd Party
Slide 29
Slide 29 text
Rough Guidelines
Most Short
Term Work
Most Long
Term Scalable
Least Short
Term Work
Least Long
Term Scalable Execute for the short term (< 1 year)
Plan for the long term (1-3 years)
Design to be changed (always)
Slide 30
Slide 30 text
All the things!
A few other quick tips
Slide 31
Slide 31 text
Learn all the things!
Build DevOps capacity
Get to know your tools well. Really well.
Slide 32
Slide 32 text
Monitor all the things!
Uptime → Pingdom, Status Cake, others
Servers → New Relic, Datadog, Munin, others
Slide 33
Slide 33 text
Record all the things!
Logs
ELK (ElasticSearch, Logstash, Kibana)
Datadog
Slide 34
Slide 34 text
Orchestrate all the things!
Ansible (Salt, Fabric, Puppet/Chef)
Slide 35
Slide 35 text
Growth and Teams: Fighting Chaos
Slide 36
Slide 36 text
Two Common Types of Organic, Small Team Growth
A single person (or small group of
people) owns something.
Common examples:
- Code review
- People management
- Support / firefighting
Superhero Model
Slide 37
Slide 37 text
Two Common Types of Organic, Small Team Growth
The “herd” (aka everyone on the
team) has distributed,
self-organizing ownership
Common examples:
- Knowledge management
- Bug fixes
- Who does what
Herd Model
Slide 38
Slide 38 text
How this works over time
Slide 39
Slide 39 text
Superheroes and Growth
Slide 40
Slide 40 text
Herds and Growth
Source: https://twitter.com/RichRogersIoT/status/914577613300965377
Slide 41
Slide 41 text
Organic → Designed
Organic Systems
Happen unintentionally over a long
period of time
Exist because “that’s how it’s
always worked”
Can often be hard to explain /
justify
Designed Systems
Are intentionally designed and
revised over time
Exist to fulfil a purpose / meet a
need
Can usually be explained /
justified
Trying to explain a system to a new hire is
a good litmus test of which type of system
it is
Slide 42
Slide 42 text
Designed Solutions
Specialization
Replace heroes with
roles
Tools
Remove friction from
unscalable systems /
processes
Processes
Bring order to the
chaos of the herd
Slide 43
Slide 43 text
People email us with issues
The email goes to the whole tech/product team (herd)
Whoever looks first (usually superhero Alice) responds
If it’s a bug they assign it to the person responsible for
the bug (herd knowledge)
Example: Organic Support
Slide 44
Slide 44 text
Example: Designed Support
Support tickets go into a queue in our support system
(tool) that is triaged by our support team (specialization)
The team triages and follows a defined escalation and
response procedure (process)
If the ticket is a bug it is put in a different queue
(tool/process) and is picked up according to priority by
one of our rotating support engineers (specialization /
process)
Slide 45
Slide 45 text
Make a pull request and whoever (herd) gets to it (usually
superhero Bob) will review and merge or make comments
If no one reviews it after a few days you can start pinging
people (herd) on slack
Example: Organic Code Review
Slide 46
Slide 46 text
Example: Designed Code Review
Every developer will be assigned a code buddy who is
responsible for reviewing your pull requests (process)
Additionally, you should ping the defined area owner of
that part of the codebase (specialization) for a secondary
review.
Pull requests should only be merged after reviews have been
approved (process) and all tests pass (tool)
Slide 47
Slide 47 text
Much like other things, your culture will also be determined
organically based on your founding team and leadership
As you scale you want to shift to a more designed culture as
well, else you will maintain the good and bad qualities
Culture will change, so your job is to let it change in a
positive way
Culture: Organic → Designed
Slide 48
Slide 48 text
Example: Organic Culture
We hire people who are like us in background / perspective /
race / gender / etc. because that’s what’s worked in the
past
We all socialize together outside of work because we always
have
We critique each other’s work openly, harshly, and regularly
because that’s what the CEO and CTO did
Slide 49
Slide 49 text
Example: Designed Culture
We aspire to be a diverse organization and so we work hard
to remove bias from our interview process and attract
candidates from all walks of life
What happens at work is what’s important, what you choose
to do outside of work is up to you
We work hard to have supportive conversations that address
the real problems without hurting people
Slide 50
Slide 50 text
Rough Guidelines
Execute for the short term
Plan for the long term
Design to be changed
Most Short
Term Work
Most Long
Term Scalable
Least Short
Term Work
Least Long
Term Scalable