Slide 1

Slide 1 text

Horizontally Scaling Your Database with Django @ashchristopher Sunday, 11 November, 12

Slide 2

Slide 2 text

Sunday, 11 November, 12

Slide 3

Slide 3 text

Two reasons to scale a database. Sunday, 11 November, 12

Slide 4

Slide 4 text

Write faster! A single database can only write data so fast. Sunday, 11 November, 12

Slide 5

Slide 5 text

Data, lots of it. There is a limited amount of data that can be hot (in memory). Sunday, 11 November, 12

Slide 6

Slide 6 text

3 Strategies to deal with database scalability Sunday, 11 November, 12

Slide 7

Slide 7 text

Scale Up Sunday, 11 November, 12

Slide 8

Slide 8 text

Feature Partitioning Sunday, 11 November, 12

Slide 9

Slide 9 text

Route data to separate databases Default (main) Accounting Transactions Sunday, 11 November, 12

Slide 10

Slide 10 text

dj-database-url https://github.com/kennethreitz/dj-database-url “Use Database URLs in your Django Application.” Sunday, 11 November, 12

Slide 11

Slide 11 text

Setup more databases DATABASES = { 'default': dj_database_url.config(default='mysql://localhost/default'), 'posts': dj_database_url.config(default='mysql://localhost/posts'), 'comments': dj_database_url.config(default='mysql://localhost/comments'), ... } Sunday, 11 November, 12

Slide 12

Slide 12 text

Database Router for Feature Partitioning Sunday, 11 November, 12

Slide 13

Slide 13 text

Databases for Read and Write Sunday, 11 November, 12

Slide 14

Slide 14 text

Relationships between different instances Sunday, 11 November, 12

Slide 15

Slide 15 text

Syncing the Database Sunday, 11 November, 12

Slide 16

Slide 16 text

Syncing the Database Always sync south! Sunday, 11 November, 12

Slide 17

Slide 17 text

Multi-database applications have a cost Sunday, 11 November, 12

Slide 18

Slide 18 text

Multi-database applications have a cost • No more ForeignKeys Sunday, 11 November, 12

Slide 19

Slide 19 text

Multi-database applications have a cost • No more ForeignKeys • No more select_related() Sunday, 11 November, 12

Slide 20

Slide 20 text

Multi-database applications have a cost • No more ForeignKeys • No more select_related() • No more prefetch_related() Sunday, 11 November, 12

Slide 21

Slide 21 text

Multi-database applications have a cost • No more ForeignKeys • No more select_related() • No more prefetch_related() • No more cascading deletes Sunday, 11 November, 12

Slide 22

Slide 22 text

Scaling Horizontally Sunday, 11 November, 12

Slide 23

Slide 23 text

Scale horizontally by sharding the data Sunday, 11 November, 12

Slide 24

Slide 24 text

Multiple databases with the same schema data_shard_01 data_shard_02 data_shard_03 data_shard_04 data_shard_05 data_shard_06 data_shard_07 data_shard_08 data_shard_10 data_shard_11 data_shard_12 data_shard_13 data_shard_14 data_shard_15 data_shard_16 Sunday, 11 November, 12

Slide 25

Slide 25 text

Pick a Sharding Key (or sharding keys) Sunday, 11 November, 12

Slide 26

Slide 26 text

Analyze Data Models (and relationships between Models) Sunday, 11 November, 12

Slide 27

Slide 27 text

Easier to Shard Sunday, 11 November, 12

Slide 28

Slide 28 text

Easier to Shard Sunday, 11 November, 12

Slide 29

Slide 29 text

Good sharding key? Sunday, 11 November, 12

Slide 30

Slide 30 text

Good sharding key? • Usually the primary key of an important element in your database. Sunday, 11 November, 12

Slide 31

Slide 31 text

Good sharding key? • Usually the primary key of an important element in your database. • Often an entity that connects many subgraphs within your database. Sunday, 11 November, 12

Slide 32

Slide 32 text

Harder to Shard Sunday, 11 November, 12

Slide 33

Slide 33 text

Harder to Shard ? Sunday, 11 November, 12

Slide 34

Slide 34 text

Bad sharding key? Sunday, 11 November, 12

Slide 35

Slide 35 text

Bad sharding key? Sunday, 11 November, 12

Slide 36

Slide 36 text

Bad sharding key? • Querying all the shards to read data. Sunday, 11 November, 12

Slide 37

Slide 37 text

Bad sharding key? • Querying all the shards to read data. • Data is saved disproportionally across shards. Sunday, 11 November, 12

Slide 38

Slide 38 text

Extend dj-database-url Pass extra options to `DATABASES` dictionary Sunday, 11 November, 12

Slide 39

Slide 39 text

Database Definitions Sunday, 11 November, 12

Slide 40

Slide 40 text

Database Definitions Classify like shards Sunday, 11 November, 12

Slide 41

Slide 41 text

Extending South Sunday, 11 November, 12

Slide 42

Slide 42 text

Extending South • South support for multi-db is limited. Sunday, 11 November, 12

Slide 43

Slide 43 text

Extending South • South support for multi-db is limited. • Need to fake migrations on `default` database. Sunday, 11 November, 12

Slide 44

Slide 44 text

Extending South • South support for multi-db is limited. • Need to fake migrations on `default` database. • Need to run migrations on each shard. Sunday, 11 November, 12

Slide 45

Slide 45 text

Database Router for Sharding Sunday, 11 November, 12

Slide 46

Slide 46 text

db_for_read() method Sunday, 11 November, 12

Slide 47

Slide 47 text

db_for_read() method Delegate shard selection to `instance`. Sunday, 11 November, 12

Slide 48

Slide 48 text

instance.get_shard() Sunday, 11 November, 12

Slide 49

Slide 49 text

instance.get_shard() Save to the same shard you were read from. Sunday, 11 November, 12

Slide 50

Slide 50 text

instance.get_shard() Save to the same shard you were read from. Save to shard as specified by the shard key. Sunday, 11 November, 12

Slide 51

Slide 51 text

allow_relation() method Not Implemented! (we don’t want to allow relations across shards) Sunday, 11 November, 12

Slide 52

Slide 52 text

allow_syncdb() method Sunday, 11 November, 12

Slide 53

Slide 53 text

Reading from a shard Sunday, 11 November, 12

Slide 54

Slide 54 text

Writing to a shard -or- -or- Sunday, 11 November, 12

Slide 55

Slide 55 text

Balancing shards Sunday, 11 November, 12

Slide 56

Slide 56 text

Balancing is hard. Sunday, 11 November, 12

Slide 57

Slide 57 text

Many Logical Shards per Physical Node Sunday, 11 November, 12

Slide 58

Slide 58 text

Copy logical shards to other nodes Sunday, 11 November, 12

Slide 59

Slide 59 text

Table Alterations are faster Adding an index to one-hundred 1GB shards is faster than adding an index to one 100GB shard. Sunday, 11 November, 12

Slide 60

Slide 60 text

Start using globally unique IDs Sunday, 11 November, 12

Slide 61

Slide 61 text

Obtain globally unique IDs Sunday, 11 November, 12

Slide 62

Slide 62 text

Obtain globally unique IDs • Use an AUTO_INCREMENT column. Sunday, 11 November, 12

Slide 63

Slide 63 text

Obtain globally unique IDs • Use an AUTO_INCREMENT column. • Write an ID generator. Sunday, 11 November, 12

Slide 64

Slide 64 text

Obtain globally unique IDs • Use an AUTO_INCREMENT column. • Write an ID generator. • Rely on external ID incrementation (redis, memcache). Sunday, 11 November, 12

Slide 65

Slide 65 text

Use an AUTO_INCREMENT column http://j.mp/SStrRc Sunday, 11 November, 12

Slide 66

Slide 66 text

AUTO_INCREMENT in `default` database Sunday, 11 November, 12

Slide 67

Slide 67 text

Create Model field that uses AutoID (remember South introspection rules) Sunday, 11 November, 12

Slide 68

Slide 68 text

ID Generator Sunday, 11 November, 12

Slide 69

Slide 69 text

ID Generator 64 BITS Timestamp (41-bits) Worker ID (11-bits) Sequence ID (12-bits) •Up to 2047 unique workers •Up to 4095 unique keys/millisecond Sunday, 11 November, 12

Slide 70

Slide 70 text

ID Generator (in Python) Sunday, 11 November, 12

Slide 71

Slide 71 text

In Summary Sunday, 11 November, 12

Slide 72

Slide 72 text

In Summary • Scale up, Feature Partition, then Shard. Sunday, 11 November, 12

Slide 73

Slide 73 text

In Summary • Scale up, Feature Partition, then Shard. • Pick an efficient key to shard on. Sunday, 11 November, 12

Slide 74

Slide 74 text

In Summary • Scale up, Feature Partition, then Shard. • Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. Sunday, 11 November, 12

Slide 75

Slide 75 text

In Summary • Scale up, Feature Partition, then Shard. • Pick an efficient key to shard on. • Don’t balance programmatically if you can help it. • External generation of globally unique IDs. Sunday, 11 November, 12

Slide 76

Slide 76 text

Questions? @ashchristopher [email protected] Sunday, 11 November, 12

Slide 77

Slide 77 text

Photo Credits http://www.flickr.com/photos/ 74964518@N00/130969487 http://www.flickr.com/photos/parkerblohm/6190781865/ http://www.flickr.com/photos/jpf/152611490/ http://www.flickr.com/photos/thomashawk/8149135586 http://www.flickr.com/photos/dansdata/3477700648/ http://www.flickr.com/photos/andercismo/2349098787/ http://www.flickr.com/photos/triller/2226679393/ http://www.flickr.com/photos/lwr/4782026853/ http://www.flickr.com/photos/aldon/3146743993/ Sunday, 11 November, 12