Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Horizontally Scaling Your Database with Django - PyCon.ca (2012)

Horizontally Scaling Your Database with Django - PyCon.ca (2012)

When web apps reach a certain size, often their data footprint will outgrow what can reasonably be stored on a single database. Scaling your database horizontally by adding more servers is the dream but it can be a daunting task. This talk outlines the process I used to add horizontal scaling to Wave Accounting's infrastructure.

Ash Christopher

November 11, 2012
Tweet

More Decks by Ash Christopher

Other Decks in Programming

Transcript

  1. Horizontally
    Scaling Your Database
    with Django
    @ashchristopher
    Sunday, 11 November, 12

    View Slide

  2. Sunday, 11 November, 12

    View Slide

  3. Two
    reasons to scale a database.
    Sunday, 11 November, 12

    View Slide

  4. Write faster!
    A single database can only write data so fast.
    Sunday, 11 November, 12

    View Slide

  5. Data, lots of it.
    There is a limited amount of data that can be hot (in memory).
    Sunday, 11 November, 12

    View Slide

  6. 3 Strategies
    to deal with database
    scalability
    Sunday, 11 November, 12

    View Slide

  7. Scale Up
    Sunday, 11 November, 12

    View Slide

  8. Feature Partitioning
    Sunday, 11 November, 12

    View Slide

  9. Route data to
    separate databases
    Default (main)
    Accounting
    Transactions
    Sunday, 11 November, 12

    View Slide

  10. dj-database-url
    https://github.com/kennethreitz/dj-database-url
    “Use Database URLs in your
    Django Application.”
    Sunday, 11 November, 12

    View Slide

  11. Setup more databases
    DATABASES = {
    'default': dj_database_url.config(default='mysql://localhost/default'),
    'posts': dj_database_url.config(default='mysql://localhost/posts'),
    'comments': dj_database_url.config(default='mysql://localhost/comments'),
    ...
    }
    Sunday, 11 November, 12

    View Slide

  12. Database Router for
    Feature Partitioning
    Sunday, 11 November, 12

    View Slide

  13. Databases for
    Read and Write
    Sunday, 11 November, 12

    View Slide

  14. Relationships between
    different instances
    Sunday, 11 November, 12

    View Slide

  15. Syncing the Database
    Sunday, 11 November, 12

    View Slide

  16. Syncing the Database
    Always sync south!
    Sunday, 11 November, 12

    View Slide

  17. Multi-database
    applications have a cost
    Sunday, 11 November, 12

    View Slide

  18. Multi-database
    applications have a cost
    • No more ForeignKeys
    Sunday, 11 November, 12

    View Slide

  19. Multi-database
    applications have a cost
    • No more ForeignKeys
    • No more select_related()
    Sunday, 11 November, 12

    View Slide

  20. Multi-database
    applications have a cost
    • No more ForeignKeys
    • No more select_related()
    • No more prefetch_related()
    Sunday, 11 November, 12

    View Slide

  21. Multi-database
    applications have a cost
    • No more ForeignKeys
    • No more select_related()
    • No more prefetch_related()
    • No more cascading deletes
    Sunday, 11 November, 12

    View Slide

  22. Scaling Horizontally
    Sunday, 11 November, 12

    View Slide

  23. Scale horizontally by
    sharding the data
    Sunday, 11 November, 12

    View Slide

  24. Multiple databases
    with the same schema
    data_shard_01 data_shard_02 data_shard_03 data_shard_04 data_shard_05
    data_shard_06 data_shard_07 data_shard_08 data_shard_10 data_shard_11
    data_shard_12 data_shard_13 data_shard_14 data_shard_15 data_shard_16
    Sunday, 11 November, 12

    View Slide

  25. Pick a Sharding Key
    (or sharding keys)
    Sunday, 11 November, 12

    View Slide

  26. Analyze Data Models
    (and relationships between Models)
    Sunday, 11 November, 12

    View Slide

  27. Easier to Shard
    Sunday, 11 November, 12

    View Slide

  28. Easier to Shard
    Sunday, 11 November, 12

    View Slide

  29. Good sharding key?
    Sunday, 11 November, 12

    View Slide

  30. Good sharding key?
    • Usually the primary key of an important
    element in your database.
    Sunday, 11 November, 12

    View Slide

  31. Good sharding key?
    • Usually the primary key of an important
    element in your database.
    • Often an entity that connects many
    subgraphs within your database.
    Sunday, 11 November, 12

    View Slide

  32. Harder to Shard
    Sunday, 11 November, 12

    View Slide

  33. Harder to Shard
    ?
    Sunday, 11 November, 12

    View Slide

  34. Bad sharding key?
    Sunday, 11 November, 12

    View Slide

  35. Bad sharding key?
    Sunday, 11 November, 12

    View Slide

  36. Bad sharding key?
    • Querying all the shards to read data.
    Sunday, 11 November, 12

    View Slide

  37. Bad sharding key?
    • Querying all the shards to read data.
    • Data is saved disproportionally
    across shards.
    Sunday, 11 November, 12

    View Slide

  38. Extend dj-database-url
    Pass extra options to `DATABASES` dictionary
    Sunday, 11 November, 12

    View Slide

  39. Database Definitions
    Sunday, 11 November, 12

    View Slide

  40. Database Definitions
    Classify like shards
    Sunday, 11 November, 12

    View Slide

  41. Extending South
    Sunday, 11 November, 12

    View Slide

  42. Extending South
    • South support for multi-db is limited.
    Sunday, 11 November, 12

    View Slide

  43. Extending South
    • South support for multi-db is limited.
    • Need to fake migrations on `default`
    database.
    Sunday, 11 November, 12

    View Slide

  44. Extending South
    • South support for multi-db is limited.
    • Need to fake migrations on `default`
    database.
    • Need to run migrations on each shard.
    Sunday, 11 November, 12

    View Slide

  45. Database Router for
    Sharding
    Sunday, 11 November, 12

    View Slide

  46. db_for_read() method
    Sunday, 11 November, 12

    View Slide

  47. db_for_read() method
    Delegate shard selection to `instance`.
    Sunday, 11 November, 12

    View Slide

  48. instance.get_shard()
    Sunday, 11 November, 12

    View Slide

  49. instance.get_shard()
    Save to the
    same shard you
    were read from.
    Sunday, 11 November, 12

    View Slide

  50. instance.get_shard()
    Save to the
    same shard you
    were read from.
    Save to shard as
    specified by the
    shard key.
    Sunday, 11 November, 12

    View Slide

  51. allow_relation() method
    Not Implemented!
    (we don’t want to allow relations across shards)
    Sunday, 11 November, 12

    View Slide

  52. allow_syncdb() method
    Sunday, 11 November, 12

    View Slide

  53. Reading from a shard
    Sunday, 11 November, 12

    View Slide

  54. Writing to a shard
    -or-
    -or-
    Sunday, 11 November, 12

    View Slide

  55. Balancing shards
    Sunday, 11 November, 12

    View Slide

  56. Balancing is hard.
    Sunday, 11 November, 12

    View Slide

  57. Many Logical Shards
    per Physical Node
    Sunday, 11 November, 12

    View Slide

  58. Copy logical shards
    to other nodes
    Sunday, 11 November, 12

    View Slide

  59. Table Alterations
    are faster
    Adding an index to one-hundred 1GB shards is
    faster than adding an index to one 100GB shard.
    Sunday, 11 November, 12

    View Slide

  60. Start using
    globally unique IDs
    Sunday, 11 November, 12

    View Slide

  61. Obtain globally unique IDs
    Sunday, 11 November, 12

    View Slide

  62. Obtain globally unique IDs
    • Use an AUTO_INCREMENT column.
    Sunday, 11 November, 12

    View Slide

  63. Obtain globally unique IDs
    • Use an AUTO_INCREMENT column.
    • Write an ID generator.
    Sunday, 11 November, 12

    View Slide

  64. Obtain globally unique IDs
    • Use an AUTO_INCREMENT column.
    • Write an ID generator.
    • Rely on external ID incrementation
    (redis, memcache).
    Sunday, 11 November, 12

    View Slide

  65. Use an
    AUTO_INCREMENT column
    http://j.mp/SStrRc
    Sunday, 11 November, 12

    View Slide

  66. AUTO_INCREMENT in
    `default` database
    Sunday, 11 November, 12

    View Slide

  67. Create Model field
    that uses AutoID
    (remember South introspection rules)
    Sunday, 11 November, 12

    View Slide

  68. ID Generator
    Sunday, 11 November, 12

    View Slide

  69. ID Generator
    64 BITS
    Timestamp (41-bits) Worker ID (11-bits)
    Sequence ID (12-bits)
    •Up to 2047 unique workers
    •Up to 4095 unique keys/millisecond
    Sunday, 11 November, 12

    View Slide

  70. ID Generator
    (in Python)
    Sunday, 11 November, 12

    View Slide

  71. In Summary
    Sunday, 11 November, 12

    View Slide

  72. In Summary
    • Scale up, Feature Partition, then Shard.
    Sunday, 11 November, 12

    View Slide

  73. In Summary
    • Scale up, Feature Partition, then Shard.
    • Pick an efficient key to shard on.
    Sunday, 11 November, 12

    View Slide

  74. In Summary
    • Scale up, Feature Partition, then Shard.
    • Pick an efficient key to shard on.
    • Don’t balance programmatically if you can
    help it.
    Sunday, 11 November, 12

    View Slide

  75. In Summary
    • Scale up, Feature Partition, then Shard.
    • Pick an efficient key to shard on.
    • Don’t balance programmatically if you can
    help it.
    • External generation of globally unique IDs.
    Sunday, 11 November, 12

    View Slide

  76. Questions?
    @ashchristopher
    [email protected]
    Sunday, 11 November, 12

    View Slide

  77. Photo Credits
    http://www.flickr.com/photos/
    74964518@N00/130969487
    http://www.flickr.com/photos/parkerblohm/6190781865/
    http://www.flickr.com/photos/jpf/152611490/
    http://www.flickr.com/photos/thomashawk/8149135586
    http://www.flickr.com/photos/dansdata/3477700648/
    http://www.flickr.com/photos/andercismo/2349098787/
    http://www.flickr.com/photos/triller/2226679393/
    http://www.flickr.com/photos/lwr/4782026853/
    http://www.flickr.com/photos/aldon/3146743993/
    Sunday, 11 November, 12

    View Slide