Horizontally Scaling Your Database with Django - PyCon.ca (2012)

Horizontally Scaling Your Database with Django @ashchristopher Sunday, 11 November,
12

Sunday, 11 November, 12

Two reasons to scale a database. Sunday, 11 November, 12

Write faster! A single database can only write data so
fast. Sunday, 11 November, 12

Data, lots of it. There is a limited amount of
data that can be hot (in memory). Sunday, 11 November, 12

3 Strategies to deal with database scalability Sunday, 11 November,
12

Scale Up Sunday, 11 November, 12

Feature Partitioning Sunday, 11 November, 12

Route data to separate databases Default (main) Accounting Transactions Sunday,
11 November, 12

dj-database-url https://github.com/kennethreitz/dj-database-url “Use Database URLs in your Django Application.” Sunday,
11 November, 12

Setup more databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'posts': dj_database_url.config(default='mysql://localhost/posts'),
'comments': dj_database_url.config(default='mysql://localhost/comments'), ... } Sunday, 11 November, 12

Database Router for Feature Partitioning Sunday, 11 November, 12

Databases for Read and Write Sunday, 11 November, 12

Relationships between different instances Sunday, 11 November, 12

Syncing the Database Sunday, 11 November, 12

Syncing the Database Always sync south! Sunday, 11 November, 12

Multi-database applications have a cost Sunday, 11 November, 12

Multi-database applications have a cost • No more ForeignKeys Sunday,
11 November, 12

Multi-database applications have a cost • No more ForeignKeys •
No more select_related() Sunday, 11 November, 12

No more select_related() • No more prefetch_related() Sunday, 11 November, 12

No more select_related() • No more prefetch_related() • No more cascading deletes Sunday, 11 November, 12

Scaling Horizontally Sunday, 11 November, 12

Scale horizontally by sharding the data Sunday, 11 November, 12

Multiple databases with the same schema data_shard_01 data_shard_02 data_shard_03 data_shard_04
data_shard_05 data_shard_06 data_shard_07 data_shard_08 data_shard_10 data_shard_11 data_shard_12 data_shard_13 data_shard_14 data_shard_15 data_shard_16 Sunday, 11 November, 12

Pick a Sharding Key (or sharding keys) Sunday, 11 November,
12

Analyze Data Models (and relationships between Models) Sunday, 11 November,
12

Easier to Shard Sunday, 11 November, 12

Good sharding key? Sunday, 11 November, 12

Good sharding key? • Usually the primary key of an
important element in your database. Sunday, 11 November, 12

Good sharding key? • Usually the primary key of an
important element in your database. • Often an entity that connects many subgraphs within your database. Sunday, 11 November, 12

Harder to Shard Sunday, 11 November, 12

Harder to Shard ? Sunday, 11 November, 12

Bad sharding key? Sunday, 11 November, 12

Bad sharding key? • Querying all the shards to read
data. Sunday, 11 November, 12

Bad sharding key? • Querying all the shards to read
data. • Data is saved disproportionally across shards. Sunday, 11 November, 12

Extend dj-database-url Pass extra options to `DATABASES` dictionary Sunday, 11
November, 12

Database Definitions Sunday, 11 November, 12

Database Definitions Classify like shards Sunday, 11 November, 12

Extending South Sunday, 11 November, 12

Extending South • South support for multi-db is limited. Sunday,
11 November, 12

Extending South • South support for multi-db is limited. •
Need to fake migrations on `default` database. Sunday, 11 November, 12

Extending South • South support for multi-db is limited. •
Need to fake migrations on `default` database. • Need to run migrations on each shard. Sunday, 11 November, 12

Database Router for Sharding Sunday, 11 November, 12

db_for_read() method Sunday, 11 November, 12

db_for_read() method Delegate shard selection to `instance`. Sunday, 11 November,
12

instance.get_shard() Sunday, 11 November, 12

instance.get_shard() Save to the same shard you were read from.

instance.get_shard() Save to the same shard you were read from.
Save to shard as specified by the shard key. Sunday, 11 November, 12

allow_relation() method Not Implemented! (we don’t want to allow relations
across shards) Sunday, 11 November, 12

allow_syncdb() method Sunday, 11 November, 12

Reading from a shard Sunday, 11 November, 12

Writing to a shard -or- -or- Sunday, 11 November, 12

Balancing shards Sunday, 11 November, 12

Balancing is hard. Sunday, 11 November, 12

Many Logical Shards per Physical Node Sunday, 11 November, 12

Copy logical shards to other nodes Sunday, 11 November, 12

Table Alterations are faster Adding an index to one-hundred 1GB
shards is faster than adding an index to one 100GB shard. Sunday, 11 November, 12

Start using globally unique IDs Sunday, 11 November, 12

Obtain globally unique IDs Sunday, 11 November, 12

Obtain globally unique IDs • Use an AUTO_INCREMENT column. Sunday,
11 November, 12

Obtain globally unique IDs • Use an AUTO_INCREMENT column. •
Write an ID generator. Sunday, 11 November, 12

Obtain globally unique IDs • Use an AUTO_INCREMENT column. •
Write an ID generator. • Rely on external ID incrementation (redis, memcache). Sunday, 11 November, 12

Use an AUTO_INCREMENT column http://j.mp/SStrRc Sunday, 11 November, 12

AUTO_INCREMENT in `default` database Sunday, 11 November, 12

Create Model field that uses AutoID (remember South introspection rules)

ID Generator Sunday, 11 November, 12

ID Generator 64 BITS Timestamp (41-bits) Worker ID (11-bits) Sequence
ID (12-bits) •Up to 2047 unique workers •Up to 4095 unique keys/millisecond Sunday, 11 November, 12

ID Generator (in Python) Sunday, 11 November, 12

In Summary Sunday, 11 November, 12

In Summary • Scale up, Feature Partition, then Shard. Sunday,
11 November, 12

In Summary • Scale up, Feature Partition, then Shard. •
Pick an efficient key to shard on. Sunday, 11 November, 12

Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. Sunday, 11 November, 12

Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. • External generation of globally unique IDs. Sunday, 11 November, 12

Questions? @ashchristopher [email protected] Sunday, 11 November, 12

Photo Credits http://www.flickr.com/photos/ 74964518@N00/130969487 http://www.flickr.com/photos/parkerblohm/6190781865/ http://www.flickr.com/photos/jpf/152611490/ http://www.flickr.com/photos/thomashawk/8149135586 http://www.flickr.com/photos/dansdata/3477700648/ http://www.flickr.com/photos/andercismo/2349098787/ http://www.flickr.com/photos/triller/2226679393/
http://www.flickr.com/photos/lwr/4782026853/ http://www.flickr.com/photos/aldon/3146743993/ Sunday, 11 November, 12

Horizontally Scaling Your Database with Django ...

Horizontally Scaling Your Database with Django - PyCon.ca (2012)

More Decks by Ash Christopher

Other Decks in Programming

Featured

Transcript