Slide 1

Slide 1 text

PyCon Ireland 2013 1 / 50 Table partitioning with Django Max Tepkeev 13 October 2013 Dublin, Ireland

Slide 2

Slide 2 text

PyCon Ireland 2013 2 / 50 Partitionin g Theory Djang o Packa ges Realizatio n

Slide 3

Slide 3 text

PyCon Ireland 2013 3 / 50 Partitioning Theory

Slide 4

Slide 4 text

PyCon Ireland 2013 4 / 50 Definition Table partitioning - division of one table into several tables, called partitions, which still represent original table.

Slide 5

Slide 5 text

PyCon Ireland 2013 5 / 50 Why • Performance • Manageability • Availability

Slide 6

Slide 6 text

PyCon Ireland 2013 6 / 50 When • Tables greater than 2GB • Tables with historical data • Table need to be distributed across different types of storage devices • Queries ALWAYS contain a filter on the partition field

Slide 7

Slide 7 text

PyCon Ireland 2013 7 / 50 Methods • Horizontal partitioning • Vertical partitioning

Slide 8

Slide 8 text

PyCon Ireland 2013 8 / 50 Strategies • Range partitioning • List partitioning • Hash partitioning • Composite partitioning

Slide 9

Slide 9 text

PyCon Ireland 2013 9 / 50 Strategies

Slide 10

Slide 10 text

PyCon Ireland 2013 10 / 50 Strategies

Slide 11

Slide 11 text

PyCon Ireland 2013 11 / 50 Example id user_id entry added 1 345 Login 2013-08-22 17:24:43 2 345 Went to Store section 2013-08-22 17:25:01 3 345 Ordered a book 2013-08-22 17:33:28 4 345 Payed for a book 2013-08-22 17:35:54 5 345 Logout 2013-08-22 17:38:32

Slide 12

Slide 12 text

PyCon Ireland 2013 12 / 50 Example INSERT INTO user_actions (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08') Goes to user_actions_y2013m08 INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42') Goes to user_actions_y2013m09

Slide 13

Slide 13 text

PyCon Ireland 2013 13 / 50 Example SELECT * FROM user_actions id user_id entry added 1 237 Login 2013-08-21 11:54:08 2 198 Logout 2013-09-01 08:43:42 Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.

Slide 14

Slide 14 text

PyCon Ireland 2013 14 / 50 Realization

Slide 15

Slide 15 text

PyCon Ireland 2013 15 / 50 RDBMS • PostgreSQL • MySQL

Slide 16

Slide 16 text

PyCon Ireland 2013 16 / 50 PostgreSQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning

Slide 17

Slide 17 text

PyCon Ireland 2013 17 / 50 PostgreSQL Implementation: • Inheritance Available: • >= 8.1

Slide 18

Slide 18 text

PyCon Ireland 2013 18 / 50 PostgreSQL Steps: • Master table • Child tables • Correct partition insertion function • Trigger that calls partition insertion function • Function to delete duplicate rows from master • Trigger that calls delete duplicate rows function

Slide 19

Slide 19 text

PyCon Ireland 2013 19 / 50 PostgreSQL CREATE TABLE logs ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id) ) Master table:

Slide 20

Slide 20 text

PyCon Ireland 2013 20 / 50 PostgreSQL CREATE TABLE logs_y2013m08 ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp ) ) INHERITS (logs); Child table:

Slide 21

Slide 21 text

PyCon Ireland 2013 21 / 50 PostgreSQL CREATE FUNCTION "logs_insert_child"() RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Correct partition insertion function:

Slide 22

Slide 22 text

PyCon Ireland 2013 22 / 50 PostgreSQL CREATE TRIGGER "before_insert_logs_trigger" BEFORE INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"(); Trigger that calls partition insertion function:

Slide 23

Slide 23 text

PyCon Ireland 2013 23 / 50 PostgreSQL CREATE FUNCTION "logs_delete_master"() RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Function to delete duplicate rows from master:

Slide 24

Slide 24 text

PyCon Ireland 2013 24 / 50 PostgreSQL CREATE TRIGGER "after_insert_logs_trigger" AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"(); Trigger that calls delete duplicate rows function:

Slide 25

Slide 25 text

PyCon Ireland 2013 25 / 50 Code for automatic new partition creation PostgreSQL DECLARE start_date TIMESTAMP; start_date := date_trunc('month', NEW.added); IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');'; END IF;

Slide 26

Slide 26 text

PyCon Ireland 2013 26 / 50 MySQL Methods: • Horizontal partitioning Strategies: • Range partitioning • List partitioning • Hash partitioning • Composite partitioning

Slide 27

Slide 27 text

PyCon Ireland 2013 27 / 50 MySQL Implementation: • Native (PARTITION BY) Available: • >= 5.1

Slide 28

Slide 28 text

PyCon Ireland 2013 28 / 50 How that works MySQL CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE );

Slide 29

Slide 29 text

PyCon Ireland 2013 29 / 50 Django Packages

Slide 30

Slide 30 text

PyCon Ireland 2013 30 / 50 Packages • django-parting • django-db-parti

Slide 31

Slide 31 text

PyCon Ireland 2013 31 / 50 django-parting RDBMS: • PostgreSQL

Slide 32

Slide 32 text

PyCon Ireland 2013 32 / 50 django-parting Features: • Partition tables with Foreign Keys

Slide 33

Slide 33 text

PyCon Ireland 2013 33 / 50 django-parting From pypi: $ pip install django-parting or clone from github: $ git clone git://github.com/danfairs/django- parting.git

Slide 34

Slide 34 text

PyCon Ireland 2013 34 / 50 django-parting Add parting to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'parting' )

Slide 35

Slide 35 text

PyCon Ireland 2013 35 / 50 django-parting from django.db import models from django.utils import timezone class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()

Slide 36

Slide 36 text

PyCon Ireland 2013 36 / 50 django-parting from django.utils import timezone from parting import PartitionManager from dateutil.relativedelta import relativedelta def _key_for_date(dt): return dt.strftime('%Y%m') class TweetPartitionManager(PartitionManager): def current_partition(self): return _key_for_date(timezone.now()) def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)

Slide 37

Slide 37 text

PyCon Ireland 2013 37 / 50 django-parting class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) partitions = TweetPartitionManager() class Meta: abstract = True class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField() partitions = TweetPartitionManager() class Meta: abstract = True

Slide 38

Slide 38 text

PyCon Ireland 2013 38 / 50 django-parting import json from django.utils.timezone import make_aware, utc tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy' } partition_key = _key_for_dt(tweet_data['created_at']) partition = Tweet.partitions.get_partition(partition_key) tweet = partition(**tweet_data) tweet.save()

Slide 39

Slide 39 text

PyCon Ireland 2013 39 / 50 django-parting Problems: • Not database-level partitioning • No django admin support • No active development

Slide 40

Slide 40 text

PyCon Ireland 2013 40 / 50 django-db-parti RDBMS: • MySQL • PostgreSQL

Slide 41

Slide 41 text

PyCon Ireland 2013 41 / 50 django-db-parti Features: • Real database-level partitioning • Automatic new partition creation in real-time • Django admin support

Slide 42

Slide 42 text

PyCon Ireland 2013 42 / 50 django-db-parti From pypi: $ pip install django-db-parti or clone from github: $ git clone git://github.com/maxtepkeev/django-db- parti.git

Slide 43

Slide 43 text

PyCon Ireland 2013 43 / 50 django-db-parti Add dbparti to PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'dbparti' )

Slide 44

Slide 44 text

PyCon Ireland 2013 44 / 50 django-db-parti In models.py add import statement: from dbparti.models import Partitionable Make your model to inherit from Partitionable: class YourModelName(Partitionable):

Slide 45

Slide 45 text

PyCon Ireland 2013 45 / 50 django-db-parti Add a Meta class to your model with a few settings: class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added' Lastly initialize some database stuff with the command: $ python manage.py partition app_name

Slide 46

Slide 46 text

PyCon Ireland 2013 46 / 50 django-db-parti Customize how data will be displayed in the Django admin In admin.py add import statement: from dbparti.admin import PartitionableAdmin Make your admin to inherit from PartitionableAdmin: class

Slide 47

Slide 47 text

PyCon Ireland 2013 47 / 50 django-db-parti Possible model settings partition_type: • range partition_subtype: • date partition_range: • day • week • month • year

Slide 48

Slide 48 text

PyCon Ireland 2013 48 / 50 django-db-parti Possible model admin settings partition_show: • all (default) • current • previous

Slide 49

Slide 49 text

PyCon Ireland 2013 49 / 50 django-db-parti Problems: • Only range partitioning (datetime) • Database backend limitations

Slide 50

Slide 50 text

PyCon Ireland 2013 50 / 50 Question time https://www.github.com/maxtepkeev/django-db- parti email: [email protected] skype: max.tepkeev