Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing Pains: Scaling a Django Project and Team by Cory Zue

7b0645f018c0bddc8ce3900ccc3ba70c?s=47 Pycon ZA
October 06, 2017

Growing Pains: Scaling a Django Project and Team by Cory Zue

In this talk I’ll describe the evolution of a Django project as it goes from something small and simple to a full-blown multi-server, multi-datacenter behemoth. The talk will use examples from real-world applications I’ve either built or contributed do, and draw heavily from my experience leading the development of CommCare HQ (an 8-year old, ~500,000 LoC codebase currently developed and maintained by about 20 people).

The goal of the talk is to try and demonstrate how and why complexity and process seeps into projects over time and is in fact necessary as a project’s lifecycle matures. I'll base the talk around a series of examples that highlight a problem and then discuss how the introduction of a new subsystem, architecture, or process helps to address that problem. These problems / solutions will be a mix of the technical (e.g. "the site is slow") and human (e.g. "we can't ship features quickly anymore").

Anyone who is part of a growing project or team should hopefully learn something or get some ideas for what might be in store for your future!

7b0645f018c0bddc8ce3900ccc3ba70c?s=128

Pycon ZA

October 06, 2017
Tweet

Transcript

  1. Growing Pains Cory Zue @czue

  2. Who am I? 2007-2017 CTO, Dimagi Today Future “Chief Accelerator”

    Builder / Entrepreneur Freelance Consultant Lover of Emojis
  3. Why this talk? Growing is hard. Especially from 1-100 people

    (or servers). It’s probably hard past 100 too, but I don’t know much about that.
  4. Three Talks Parts Scaling Code Scaling your Stack Scaling Teams

  5. Scaling Code: Fighting Complexity

  6. A codebase

  7. A codebase • Interdependencies scale exponentially • No longer fits

    in one person’s head
  8. A codebase • Takes longer to reason with • Harder

    / slower to test things • Longer to onboard new people
  9. Enter: abstraction System Glue Subsystem Subsystem Subsystem • Reduce the

    size of any single component to be able to fit in one person’s head • Allows for specialization, if necessary
  10. A more sustainable codebase

  11. A more sustainable codebase

  12. A more sustainable codebase

  13. A more sustainable codebase

  14. How to abstract? No structure “Organized” monolith Modules define public

    APIs via __init__.py Shared modules using git submodules Shared libraries using PyPI Service Oriented Architecture Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable
  15. Rough Guidelines Execute for the short term (< 1 year)

    Plan for the long term (1-3 years) Design to be changed (always) Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable
  16. All the things! A few other quick tips

  17. Clean all the things! Rule: Always leave code cleaner than

    when you found it Flake CI Conduct refactorathons Resource tech debt
  18. Review all the things! Code review is critical Conduct 10%

    design reviews for big features
  19. (Automated) Test all the things! Test while coding Write failing

    tests before fixing bugs Build a CI pipeline
  20. Growth and Infrastructure: Fighting Scale

  21. Django Web Server Database Database Python App Code Web Server

  22. Database Database / 3rd Party Django Database Cache The site

    is getting slow! Python App Code Web Server
  23. Django Database Cache We need background processing! Background Processor Celery

    (background processor) Python App Code Web Server Database / 3rd Party
  24. Celery Celery Django Django Django Database Cache The machines can’t

    keep up with the load! Background Processor Celery (background processor) Python App Code Web Server Database / 3rd Party
  25. Celery Celery Django Django Django Database Cache The database can’t

    keep up with the load! Background Processor Celery (background processor) Database (Read Replica(s)) Python App Code Web Server Database / 3rd Party
  26. Stream Processing Stream Processing Celery Celery Django Django Django Database

    Cache Our reports are taking way too long! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing Streaming Platform Python App Code Web Server Database / 3rd Party
  27. Stream Processing Stream Processing Celery Celery Django Django Django Database

    Cache It’s the type of data in the database! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Streaming Platform Python App Code Web Server Database / 3rd Party
  28. Stream Processing Stream Processing Celery Celery Django Django Django Databases

    Cache The data is just too big! Background Processor Celery (background processor) Database (Read Replica(s)) Analytics Database Stream Processing “Blob” Database Database https://www.youtube.com/watch?v=EmyVoIi6W1c Streaming Platform Python App Code Web Server Database / 3rd Party
  29. Rough Guidelines Most Short Term Work Most Long Term Scalable

    Least Short Term Work Least Long Term Scalable Execute for the short term (< 1 year) Plan for the long term (1-3 years) Design to be changed (always)
  30. All the things! A few other quick tips

  31. Learn all the things! Build DevOps capacity Get to know

    your tools well. Really well.
  32. Monitor all the things! Uptime → Pingdom, Status Cake, others

    Servers → New Relic, Datadog, Munin, others
  33. Record all the things! Logs ELK (ElasticSearch, Logstash, Kibana) Datadog

  34. Orchestrate all the things! Ansible (Salt, Fabric, Puppet/Chef)

  35. Growth and Teams: Fighting Chaos

  36. Two Common Types of Organic, Small Team Growth A single

    person (or small group of people) owns something. Common examples: - Code review - People management - Support / firefighting Superhero Model
  37. Two Common Types of Organic, Small Team Growth The “herd”

    (aka everyone on the team) has distributed, self-organizing ownership Common examples: - Knowledge management - Bug fixes - Who does what Herd Model
  38. How this works over time

  39. Superheroes and Growth

  40. Herds and Growth Source: https://twitter.com/RichRogersIoT/status/914577613300965377

  41. Organic → Designed Organic Systems Happen unintentionally over a long

    period of time Exist because “that’s how it’s always worked” Can often be hard to explain / justify Designed Systems Are intentionally designed and revised over time Exist to fulfil a purpose / meet a need Can usually be explained / justified Trying to explain a system to a new hire is a good litmus test of which type of system it is
  42. Designed Solutions Specialization Replace heroes with roles Tools Remove friction

    from unscalable systems / processes Processes Bring order to the chaos of the herd
  43. People email us with issues The email goes to the

    whole tech/product team (herd) Whoever looks first (usually superhero Alice) responds If it’s a bug they assign it to the person responsible for the bug (herd knowledge) Example: Organic Support
  44. Example: Designed Support Support tickets go into a queue in

    our support system (tool) that is triaged by our support team (specialization) The team triages and follows a defined escalation and response procedure (process) If the ticket is a bug it is put in a different queue (tool/process) and is picked up according to priority by one of our rotating support engineers (specialization / process)
  45. Make a pull request and whoever (herd) gets to it

    (usually superhero Bob) will review and merge or make comments If no one reviews it after a few days you can start pinging people (herd) on slack Example: Organic Code Review
  46. Example: Designed Code Review Every developer will be assigned a

    code buddy who is responsible for reviewing your pull requests (process) Additionally, you should ping the defined area owner of that part of the codebase (specialization) for a secondary review. Pull requests should only be merged after reviews have been approved (process) and all tests pass (tool)
  47. Much like other things, your culture will also be determined

    organically based on your founding team and leadership As you scale you want to shift to a more designed culture as well, else you will maintain the good and bad qualities Culture will change, so your job is to let it change in a positive way Culture: Organic → Designed
  48. Example: Organic Culture We hire people who are like us

    in background / perspective / race / gender / etc. because that’s what’s worked in the past We all socialize together outside of work because we always have We critique each other’s work openly, harshly, and regularly because that’s what the CEO and CTO did
  49. Example: Designed Culture We aspire to be a diverse organization

    and so we work hard to remove bias from our interview process and attract candidates from all walks of life What happens at work is what’s important, what you choose to do outside of work is up to you We work hard to have supportive conversations that address the real problems without hurting people
  50. Rough Guidelines Execute for the short term Plan for the

    long term Design to be changed Most Short Term Work Most Long Term Scalable Least Short Term Work Least Long Term Scalable
  51. Questions? www.coryzue.com, @czue www.dimagi.com (we’re hiring!)