Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Horizontally Scaling Your Database with Django - PyCon.ca (2012)

Horizontally Scaling Your Database with Django - PyCon.ca (2012)

When web apps reach a certain size, often their data footprint will outgrow what can reasonably be stored on a single database. Scaling your database horizontally by adding more servers is the dream but it can be a daunting task. This talk outlines the process I used to add horizontal scaling to Wave Accounting's infrastructure.

Ash Christopher

November 11, 2012
Tweet

More Decks by Ash Christopher

Other Decks in Programming

Transcript

  1. Data, lots of it. There is a limited amount of

    data that can be hot (in memory). Sunday, 11 November, 12
  2. Multi-database applications have a cost • No more ForeignKeys •

    No more select_related() Sunday, 11 November, 12
  3. Multi-database applications have a cost • No more ForeignKeys •

    No more select_related() • No more prefetch_related() Sunday, 11 November, 12
  4. Multi-database applications have a cost • No more ForeignKeys •

    No more select_related() • No more prefetch_related() • No more cascading deletes Sunday, 11 November, 12
  5. Multiple databases with the same schema data_shard_01 data_shard_02 data_shard_03 data_shard_04

    data_shard_05 data_shard_06 data_shard_07 data_shard_08 data_shard_10 data_shard_11 data_shard_12 data_shard_13 data_shard_14 data_shard_15 data_shard_16 Sunday, 11 November, 12
  6. Good sharding key? • Usually the primary key of an

    important element in your database. Sunday, 11 November, 12
  7. Good sharding key? • Usually the primary key of an

    important element in your database. • Often an entity that connects many subgraphs within your database. Sunday, 11 November, 12
  8. Bad sharding key? • Querying all the shards to read

    data. • Data is saved disproportionally across shards. Sunday, 11 November, 12
  9. Extending South • South support for multi-db is limited. •

    Need to fake migrations on `default` database. Sunday, 11 November, 12
  10. Extending South • South support for multi-db is limited. •

    Need to fake migrations on `default` database. • Need to run migrations on each shard. Sunday, 11 November, 12
  11. instance.get_shard() Save to the same shard you were read from.

    Save to shard as specified by the shard key. Sunday, 11 November, 12
  12. Table Alterations are faster Adding an index to one-hundred 1GB

    shards is faster than adding an index to one 100GB shard. Sunday, 11 November, 12
  13. Obtain globally unique IDs • Use an AUTO_INCREMENT column. •

    Write an ID generator. Sunday, 11 November, 12
  14. Obtain globally unique IDs • Use an AUTO_INCREMENT column. •

    Write an ID generator. • Rely on external ID incrementation (redis, memcache). Sunday, 11 November, 12
  15. ID Generator 64 BITS Timestamp (41-bits) Worker ID (11-bits) Sequence

    ID (12-bits) •Up to 2047 unique workers •Up to 4095 unique keys/millisecond Sunday, 11 November, 12
  16. In Summary • Scale up, Feature Partition, then Shard. •

    Pick an efficient key to shard on. Sunday, 11 November, 12
  17. In Summary • Scale up, Feature Partition, then Shard. •

    Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. Sunday, 11 November, 12
  18. In Summary • Scale up, Feature Partition, then Shard. •

    Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. • External generation of globally unique IDs. Sunday, 11 November, 12