Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mastering Duct Tape (PyCon Balkan 2018)

David Cramer
November 18, 2018

Mastering Duct Tape (PyCon Balkan 2018)

There's a lot of hard problems in software, but how many of them are created from decisions we make day to day? Let's talk about pragmatism and a number of difficult, but common situations where we often over-engineer solutions.

David Cramer

November 18, 2018
Tweet

More Decks by David Cramer

Other Decks in Technology

Transcript

  1. Why should you trust me? • Self-taught with 15 years

    of software engineering
 12 years in Python • Scaled Disqus (RIP) to 1 billion page views
 One page view is roughly one rendered embed • Over-engineered a multi-thousand node continuous integration platform at Dropbox
 It’s what you do when your test suite is Too Damn Slow™ • Scraped together Sentry on a budget
 sentry.io receives 2 billion exceptions/day as of Oct 2018 • I mash keys at 160 wpm
 An important strength of being able to duct tape, quickly
  2. What can we duct tape in software? Databases Infrastructure Monitoring

    Code Quality (spoiler: everything) “If you can’t fix it with duct tape,
 you aren’t using enough duct tape”
  3. Users Posts Comments Column Type id INT post_id INT display_name

    TEXT body TEXT date_posted DATETIME Comments Schema Standard Application Schema
  4. Quick Diagnostics 1. Are our queries well indexed? • Indexes

    reduce disk IO, which is a common bottleneck 2. How large is the table / relation? • Row count (500 million) and size on disk (10 TB) 3. Physical resources (cpu, memory)? • Memory is the usual concern
  5. SELECT comments.id, comments.display_name, comments.body, comments.date_posted, FROM comments WHERE comments.post_id =

    ? ORDER BY comments.date_posted DESC Basic Indexes CREATE INDEX my_index_name ( comments.post_id ) ON comments
  6. Optimizing Reads SELECT comments.id, comments.display_name, comments.body, comments.date_posted, FROM comments WHERE

    comments.post_id = ? ORDER BY comments.date_posted DESC CREATE INDEX my_index_name ( comments.post_id, comments.date_posted ) ON comments Note: date_posted has high cardinality, which means this index is more expensive to maintain
  7. Users Posts Comments Votes Revisions Whenever you edit a comment,

    we end up with a new revision or help rank the best comments, which is how we sort
  8. Column Type id INT author_id INT (FOREIGN KEY on users)

    post_id INT (FOREIGN KEY on posts) latest_revision_id INT (FOREIGN KEY on revisions) date_posted DATETIME Comments Schema Users Posts Comments Votes Revisions
  9. SELECT comments.id, comments.date_posted, revisions.body, users.display_name FROM comments JOIN revisions ON

    comments.latest_revision_id = revisions.id JOIN users ON comments.author_id = users.id WHERE comments.post_id = ? ORDER BY comments.date_posted DESC
  10. Vertical Partitions Users Posts Main Database Comments Comments Database Comments

    Schema Column Type id INT author_id INT (FOREIGN KEY on users) post_id INT (FOREIGN KEY on posts) latest_revision_id INT (FOREIGN KEY on revisions) date_posted DATETIME Revisions
  11. Vertical Partions (cont.) 1. Remove newly invalid foreign key constraints

    2. Replicate tables to new database server 3. Update application code to remove relations
 (common in frameworks like Django) 4. [some magic or downtime to cutover databases] General process to split off relations
  12. Vertical Partions (cont.) • (Likely) Find code which is still

    referencing the relation incorrectly • Write code to handle comments with missing posts • Write code to delete comments when posts go away 1. Remove newly invalid foreign key constraints 2. Replicate tables to new database server 3. Update application code to remove relations
 (common in frameworks like Django) 4. [some magic or downtime to cutover databases] General process to split off relations Some other things we almost certainly didn’t think about
  13. Horizontal Partitions Users Posts Main Database Comments Comments 2018 Revisions

    Comments Comments 2020 Revisions Comments Comments 2019 Revisions Comments Comments 2021 Revisions
  14. Horizontal Partitions (cont.) Users Posts Main Database Comments Comments 2018-12

    Revisions Comments Comments 2019-03 Revisions Comments Comments 2019-01 Revisions Comments Comments 2019-04 Revisions Comments Comments 2019-02 Revisions Comments Comments 2019-05 Revisions
  15. Horizontal Partitions (cont.) Users Main Database Comments Comments 2018-12 Revisions

    Comments Comments 2019-01 Revisions Posts Posts 2018-12 Posts 2019-01 Posts
  16. No!

  17. User Interface API Database Service Service Service Service Service Database

    Database Database User Interface Monolith Microservices (a trendy way to describe a SOA)
  18. REPO 1 REPO 1 Database REPO 2 REPO 4 REPO

    3 REPO 5 REPO 6 Database Database Database REPO 1 Monolith Microservices
  19. Moving to a SOA 1. Build a framework
 You might

    as well use a new language while you’re at it 2. Rewrite the previous developers code
 It was old and nobody liked that developer anyways 3. Profit!
  20. Moving to a SOA (cont.) 1. Build a framework 2.

    Setup an event stream (Kafka) 3. Break apart your monolithic MySQL database 4. Write a service which owns one set of your problems 5. Attempt [and fail] to setup automated testing 6. Create a new way to deploy code 7. … 8. Profit!
  21. Goal Setting • Speed up the build/test/release process • Better

    ownership and autonomy • Improve reliability
 (through stronger API contracts, reduced complexity) • Transition away from legacy hard-to-support systems
  22. User Interface Database Duct Tape Oriented Architecture because we just

    need to get things done User Interface User Interface Service THE MICROLIFT Service Service Service Service
  23. Remove Humans Remove humans anywhere you can — rely on

    robots!
 Let everyone ship their own code ^ an office manager can deploy code at Sentry!
  24. Optimize Your Tests You must spend time on build/test suite

    performance
 People often overlook how slow database access is and it’s easy to fix! • Profile your test suite! • You don’t have to run every test, every time • Use transactions to create quick database tests
 https://github.com/getsentry/zeus/blob/5004a6b7c538fada3e98c8943ea5385234a8220b/zeus/testutils/pytest.py#L89 • Replace production services with no-ops where possible
 https://docs.djangoproject.com/en/2.1/topics/cache/#dummy-caching-for-development • Mock third party network calls
 https://github.com/getsentry/responses
  25. Test in Production Let production be production
 Don’t try to

    mock your production environment — it wont work v2 v1 LIVE TRAFFIC 99% 1% utilize a tool like LaunchDarkly to scope features
  26. Define Ownership Ownership doesn’t mean you need a separate codebase


    You can deploy service-isolated copies of your monolith REPO 1 sentry.io REPO 1 api.sentry.io REPO 2 docs.sentry.io REPO 1 ingest.sentry.io https://help.github.com/articles/about-codeowners/ lead Docs Team (and put teams on-call for their services)
  27. If nothing else.. • Remove human roadblocks
 If it annoys

    you, its probably wasting time (and money) • Enable people to do their best work
 Treat people as adults and give them the tools they need to succeed • Take off your engineering hat
 Focus on the business goals - less on academics • “Time to Ship” is your metric
 The faster you can change and react to your customers, the more fun and success your going to enjoy