Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing Pains: Scaling a Django Project and Tea...

Pycon ZA
October 06, 2017

Growing Pains: Scaling a Django Project and Team by Cory Zue

In this talk I’ll describe the evolution of a Django project as it goes from something small and simple to a full-blown multi-server, multi-datacenter behemoth. The talk will use examples from real-world applications I’ve either built or contributed do, and draw heavily from my experience leading the development of CommCare HQ (an 8-year old, ~500,000 LoC codebase currently developed and maintained by about 20 people).

The goal of the talk is to try and demonstrate how and why complexity and process seeps into projects over time and is in fact necessary as a project’s lifecycle matures. I'll base the talk around a series of examples that highlight a problem and then discuss how the introduction of a new subsystem, architecture, or process helps to address that problem. These problems / solutions will be a mix of the technical (e.g. "the site is slow") and human (e.g. "we can't ship features quickly anymore").

Anyone who is part of a growing project or team should hopefully learn something or get some ideas for what might be in store for your future!

Pycon ZA

October 06, 2017
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Who am I? 2007-2017 CTO, Dimagi Today Future “Chief Accelerator”

    Builder / Entrepreneur Freelance Consultant Lover of Emojis
  2. Why this talk? Growing is hard. Especially from 1-100 people

    (or servers). It’s probably hard past 100 too, but I don’t know much about that.
  3. A codebase • Takes longer to reason with • Harder

    / slower to test things • Longer to onboard new people
  4. Enter: abstraction System Glue Subsystem Subsystem Subsystem • Reduce the

    size of any single component to be able to fit in one person’s head • Allows for specialization, if necessary
  5. How to abstract? No structure “Organized” monolith Modules define public

    APIs via __init__.py Shared modules using git submodules Shared libraries using PyPI Service Oriented Architecture Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable
  6. Rough Guidelines Execute for the short term (< 1 year)

    Plan for the long term (1-3 years) Design to be changed (always) Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable
  7. Clean all the things! Rule: Always leave code cleaner than

    when you found it Flake CI Conduct refactorathons Resource tech debt
  8. (Automated) Test all the things! Test while coding Write failing

    tests before fixing bugs Build a CI pipeline
  9. Database Database / 3rd Party Django Database Cache The site

    is getting slow! Python App Code Web Server
  10. Django Database Cache We need background processing! Background Processor Celery

    (background processor) Python App Code Web Server Database / 3rd Party
  11. Celery Celery Django Django Django Database Cache The machines can’t

    keep up with the load! Background Processor Celery (background processor) Python App Code Web Server Database / 3rd Party
  12. Celery Celery Django Django Django Database Cache The database can’t

    keep up with the load! Background Processor Celery (background processor) Database (Read Replica(s)) Python App Code Web Server Database / 3rd Party
  13. Stream Processing Stream Processing Celery Celery Django Django Django Database

    Cache Our reports are taking way too long! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing Streaming Platform Python App Code Web Server Database / 3rd Party
  14. Stream Processing Stream Processing Celery Celery Django Django Django Database

    Cache It’s the type of data in the database! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Streaming Platform Python App Code Web Server Database / 3rd Party
  15. Stream Processing Stream Processing Celery Celery Django Django Django Databases

    Cache The data is just too big! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Database https://www.youtube.com/watch?v=EmyVoIi6W1c Streaming Platform Python App Code Web Server Database / 3rd Party
  16. Rough Guidelines Most Short Term Work Most Long Term Scalable

    Least Short Term Work Least Long Term Scalable Execute for the short term (< 1 year) Plan for the long term (1-3 years) Design to be changed (always)
  17. Monitor all the things! Uptime → Pingdom, Status Cake, others

    Servers → New Relic, Datadog, Munin, others
  18. Two Common Types of Organic, Small Team Growth A single

    person (or small group of people) owns something. Common examples: - Code review - People management - Support / firefighting Superhero Model
  19. Two Common Types of Organic, Small Team Growth The “herd”

    (aka everyone on the team) has distributed, self-organizing ownership Common examples: - Knowledge management - Bug fixes - Who does what Herd Model
  20. Organic → Designed Organic Systems Happen unintentionally over a long

    period of time Exist because “that’s how it’s always worked” Can often be hard to explain / justify Designed Systems Are intentionally designed and revised over time Exist to fulfil a purpose / meet a need Can usually be explained / justified Trying to explain a system to a new hire is a good litmus test of which type of system it is
  21. Designed Solutions Specialization Replace heroes with roles Tools Remove friction

    from unscalable systems / processes Processes Bring order to the chaos of the herd
  22. People email us with issues The email goes to the

    whole tech/product team (herd) Whoever looks first (usually superhero Alice) responds If it’s a bug they assign it to the person responsible for the bug (herd knowledge) Example: Organic Support
  23. Example: Designed Support Support tickets go into a queue in

    our support system (tool) that is triaged by our support team (specialization) The team triages and follows a defined escalation and response procedure (process) If the ticket is a bug it is put in a different queue (tool/process) and is picked up according to priority by one of our rotating support engineers (specialization / process)
  24. Make a pull request and whoever (herd) gets to it

    (usually superhero Bob) will review and merge or make comments If no one reviews it after a few days you can start pinging people (herd) on slack Example: Organic Code Review
  25. Example: Designed Code Review Every developer will be assigned a

    code buddy who is responsible for reviewing your pull requests (process) Additionally, you should ping the defined area owner of that part of the codebase (specialization) for a secondary review. Pull requests should only be merged after reviews have been approved (process) and all tests pass (tool)
  26. Much like other things, your culture will also be determined

    organically based on your founding team and leadership As you scale you want to shift to a more designed culture as well, else you will maintain the good and bad qualities Culture will change, so your job is to let it change in a positive way Culture: Organic → Designed
  27. Example: Organic Culture We hire people who are like us

    in background / perspective / race / gender / etc. because that’s what’s worked in the past We all socialize together outside of work because we always have We critique each other’s work openly, harshly, and regularly because that’s what the CEO and CTO did
  28. Example: Designed Culture We aspire to be a diverse organization

    and so we work hard to remove bias from our interview process and attract candidates from all walks of life What happens at work is what’s important, what you choose to do outside of work is up to you We work hard to have supportive conversations that address the real problems without hurting people
  29. Rough Guidelines Execute for the short term Plan for the

    long term Design to be changed Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable