Mastering Duct Tape (PyCon Balkan 2018)

Mastering Duct Tape

David Cramer Founder, CEO at Sentry  @zeeg github.com/dcramer (i still
write a lot of code)

Why should you trust me? • Self-taught with 15 years
of software engineering  12 years in Python • Scaled Disqus (RIP) to 1 billion page views  One page view is roughly one rendered embed • Over-engineered a multi-thousand node continuous integration platform at Dropbox  It’s what you do when your test suite is Too Damn Slow™ • Scraped together Sentry on a budget  sentry.io receives 2 billion exceptions/day as of Oct 2018 • I mash keys at 160 wpm  An important strength of being able to duct tape, quickly

What do we mean by duct tape?

What can we duct tape in software? Databases Infrastructure Monitoring
Code Quality (spoiler: everything) “If you can’t ﬁx it with duct tape,  you aren’t using enough duct tape”

•Act 1: Databases •Act 2: Everything Else •AMA

(SQL) Databases (we all have the same problems) ACT 1

Users Posts Comments Standard Application Schema (every app looks like
a blog)

Users Posts Comments Column Type id INT post_id INT display_name
TEXT body TEXT date_posted DATETIME Comments Schema Standard Application Schema

SELECT comments.id, comments.display_name, comments.body, comments.date_posted, FROM comments WHERE comments.post_id =
? ORDER BY comments.date_posted DESC Retrieving Comments

What do we do when things are slow?

Quick Diagnostics 1. Are our queries well indexed? • Indexes
reduce disk IO, which is a common bottleneck 2. How large is the table / relation? • Row count (500 million) and size on disk (10 TB) 3. Physical resources (cpu, memory)? • Memory is the usual concern

SELECT comments.id, comments.display_name, comments.body, comments.date_posted, FROM comments WHERE comments.post_id =
? ORDER BY comments.date_posted DESC Basic Indexes CREATE INDEX my_index_name ( comments.post_id ) ON comments

Optimizing Reads SELECT comments.id, comments.display_name, comments.body, comments.date_posted, FROM comments WHERE
comments.post_id = ? ORDER BY comments.date_posted DESC CREATE INDEX my_index_name ( comments.post_id, comments.date_posted ) ON comments Note: date_posted has high cardinality, which means this index is more expensive to maintain

Real life isn’t that easy..

Users Posts Comments Votes Revisions Whenever you edit a comment,
we end up with a new revision or help rank the best comments, which is how we sort

Column Type id INT author_id INT (FOREIGN KEY on users)
post_id INT (FOREIGN KEY on posts) latest_revision_id INT (FOREIGN KEY on revisions) date_posted DATETIME Comments Schema Users Posts Comments Votes Revisions

SELECT comments.id, comments.date_posted, revisions.body, users.display_name FROM comments JOIN revisions ON
comments.latest_revision_id = revisions.id JOIN users ON comments.author_id = users.id WHERE comments.post_id = ? ORDER BY comments.date_posted DESC

What if the fully indexed query is slow?

Vertical Partitions Users Posts Main Database Comments Comments Database Revisions

Vertical Partitions Users Posts Main Database Comments Comments Database Comments
Schema Column Type id INT author_id INT (FOREIGN KEY on users) post_id INT (FOREIGN KEY on posts) latest_revision_id INT (FOREIGN KEY on revisions) date_posted DATETIME Revisions

Vertical Partions (cont.) 1. Remove newly invalid foreign key constraints
2. Replicate tables to new database server 3. Update application code to remove relations  (common in frameworks like Django) 4. [some magic or downtime to cutover databases] General process to split oﬀ relations

Vertical Partions (cont.) • (Likely) Find code which is still
referencing the relation incorrectly • Write code to handle comments with missing posts • Write code to delete comments when posts go away 1. Remove newly invalid foreign key constraints 2. Replicate tables to new database server 3. Update application code to remove relations  (common in frameworks like Django) 4. [some magic or downtime to cutover databases] General process to split oﬀ relations Some other things we almost certainly didn’t think about

But what if we hit the same problem on the
new database?

Horizontal Partitions Users Posts Main Database Comments Comments 2018 Revisions
Comments Comments 2020 Revisions Comments Comments 2019 Revisions Comments Comments 2021 Revisions

Horizontal Partitions (cont.) Users Posts Main Database Comments Comments 2018-12
Revisions Comments Comments 2019-03 Revisions Comments Comments 2019-01 Revisions Comments Comments 2019-04 Revisions Comments Comments 2019-02 Revisions Comments Comments 2019-05 Revisions

Horizontal Partitions (cont.) Users Main Database Comments Comments 2018-12 Revisions
Comments Comments 2019-01 Revisions Posts Posts 2018-12 Posts 2019-01 Posts

Maybe we should just use MongoDB? Clickhouse? Cassandra? Elastic Search?
HBase? Our own database? Redis?

Just buy a bigger server! (or use your favorite cloud
provider to provision one)

Services and Things ACT 2

User Interface API Database Monolith

Monoliths aren’t cool anymore Monoliths slow us down

User Interface API Database Monolith User Interface Database Service Oriented
(SOA) Service Service Database

SOA isn’t cool anymore SOA is an outdated idea

User Interface API Database Service Service Service Service Service Database
Database Database User Interface Monolith Microservices (a trendy way to describe a SOA)

How do we get there?

REPO 1 REPO 1 Database REPO 2 REPO 4 REPO
3 REPO 5 REPO 6 Database Database Database REPO 1 Monolith Microservices

Moving to a SOA 1. Build a framework  You might
as well use a new language while you’re at it 2. Rewrite the previous developers code  It was old and nobody liked that developer anyways 3. Proﬁt!

Moving to a SOA (cont.) 1. Build a framework 2.
Setup an event stream (Kafka) 3. Break apart your monolithic MySQL database 4. Write a service which owns one set of your problems 5. Attempt [and fail] to setup automated testing 6. Create a new way to deploy code 7. … 8. Proﬁt!

Let’s step back and look at our goals

Goal Setting • Speed up the build/test/release process • Better
ownership and autonomy • Improve reliability  (through stronger API contracts, reduced complexity) • Transition away from legacy hard-to-support systems

Do we really need a SOA?

User Interface Database Duct Tape Oriented Architecture because we just
need to get things done User Interface User Interface Service THE MICROLIFT Service Service Service Service

Test Review Release Build Code largely automated social processes Change
Control your most important lever

Remove Humans Remove humans anywhere you can — rely on
robots!  Let everyone ship their own code ^ an office manager can deploy code at Sentry!

Optimize Your Tests You must spend time on build/test suite
performance  People often overlook how slow database access is and it’s easy to ﬁx! • Proﬁle your test suite! • You don’t have to run every test, every time • Use transactions to create quick database tests  https://github.com/getsentry/zeus/blob/5004a6b7c538fada3e98c8943ea5385234a8220b/zeus/testutils/pytest.py#L89 • Replace production services with no-ops where possible  https://docs.djangoproject.com/en/2.1/topics/cache/#dummy-caching-for-development • Mock third party network calls  https://github.com/getsentry/responses

Test in Production Let production be production  Don’t try to
mock your production environment — it wont work v2 v1 LIVE TRAFFIC 99% 1% utilize a tool like LaunchDarkly to scope features

Deﬁne Ownership Ownership doesn’t mean you need a separate codebase 
You can deploy service-isolated copies of your monolith REPO 1 sentry.io REPO 1 api.sentry.io REPO 2 docs.sentry.io REPO 1 ingest.sentry.io https://help.github.com/articles/about-codeowners/ lead Docs Team (and put teams on-call for their services)

What about legacy systems? Let them be legacy. You have
more important things to do.

In Closing FINALE

If nothing else.. • Remove human roadblocks  If it annoys
you, its probably wasting time (and money) • Enable people to do their best work  Treat people as adults and give them the tools they need to succeed • Take oﬀ your engineering hat  Focus on the business goals - less on academics • “Time to Ship” is your metric  The faster you can change and react to your customers, the more fun and success your going to enjoy

AMA (ask me anything)

Mastering Duct Tape (PyCon Balkan 2018)

Mastering Duct Tape (PyCon Balkan 2018)

More Decks by David Cramer

Other Decks in Technology

Featured

Transcript