Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyCon Ireland 2013: Table Partitioning with Django

PyCon Ireland 2013: Table Partitioning with Django

(via http://python.ie/pycon/2013/talks/table_partitioning_with_django/)
Speaker: Max Tepkeev

This talk will cover table partitioning theory in general, different partitioning methods: horizontal and vertical partitioning. We will talk about partitioning criteria, how partitions can be organized: range partitioning, list partitioning, hash partitioning and composite partitioning. We will also talk about partitioning implementations in different database servers, how they differ from each other, what problems can we face because of that difference. Django provides us with great database abstraction and ORM, but how can we use it with table partitioning ? We will talk about existing libraries for Django to work with table partitioning, how they differ, which is the best (if any) and why.

PyCon Ireland

October 13, 2013
Tweet

More Decks by PyCon Ireland

Other Decks in Technology

Transcript

  1. PyCon Ireland 2013 1 / 50 Table partitioning with Django

    Max Tepkeev 13 October 2013 Dublin, Ireland
  2. PyCon Ireland 2013 4 / 50 Definition Table partitioning -

    division of one table into several tables, called partitions, which still represent original table.
  3. PyCon Ireland 2013 5 / 50 Why • Performance •

    Manageability • Availability
  4. PyCon Ireland 2013 6 / 50 When • Tables greater

    than 2GB • Tables with historical data • Table need to be distributed across different types of storage devices • Queries ALWAYS contain a filter on the partition field
  5. PyCon Ireland 2013 8 / 50 Strategies • Range partitioning

    • List partitioning • Hash partitioning • Composite partitioning
  6. PyCon Ireland 2013 11 / 50 Example id user_id entry

    added 1 345 Login 2013-08-22 17:24:43 2 345 Went to Store section 2013-08-22 17:25:01 3 345 Ordered a book 2013-08-22 17:33:28 4 345 Payed for a book 2013-08-22 17:35:54 5 345 Logout 2013-08-22 17:38:32
  7. PyCon Ireland 2013 12 / 50 Example INSERT INTO user_actions

    (user_id, entry, added) VALUES (237, 'Login', '2013-08-21 11:54:08') Goes to user_actions_y2013m08 INSERT INTO user_actions (user_id, entry, added) VALUES (198, 'Logout', '2013-09-01 08:43:42') Goes to user_actions_y2013m09
  8. PyCon Ireland 2013 13 / 50 Example SELECT * FROM

    user_actions id user_id entry added 1 237 Login 2013-08-21 11:54:08 2 198 Logout 2013-09-01 08:43:42 Table partitioning is “transparent”. You don’t need to change your code to work with partitioned tables.
  9. PyCon Ireland 2013 16 / 50 PostgreSQL Methods: • Horizontal

    partitioning Strategies: • Range partitioning • List partitioning
  10. PyCon Ireland 2013 18 / 50 PostgreSQL Steps: • Master

    table • Child tables • Correct partition insertion function • Trigger that calls partition insertion function • Function to delete duplicate rows from master • Trigger that calls delete duplicate rows function
  11. PyCon Ireland 2013 19 / 50 PostgreSQL CREATE TABLE logs

    ( id serial, entry text NOT NULL, added timestamp(6) NOT NULL, CONSTRAINT logs_pkey PRIMARY KEY (id) ) Master table:
  12. PyCon Ireland 2013 20 / 50 PostgreSQL CREATE TABLE logs_y2013m08

    ( CHECK ( added >= '2013-08-01 00:00:00'::timestamp AND added <= '2013-08-31 23:59:59'::timestamp ) ) INHERITS (logs); Child table:
  13. PyCon Ireland 2013 21 / 50 PostgreSQL CREATE FUNCTION "logs_insert_child"()

    RETURNS "trigger" AS $BODY$ DECLARE tablename TEXT; BEGIN tablename := 'logs_' || to_char(NEW.added, '"y"YYYY"m"MM'); EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Correct partition insertion function:
  14. PyCon Ireland 2013 22 / 50 PostgreSQL CREATE TRIGGER "before_insert_logs_trigger"

    BEFORE INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_insert_child"(); Trigger that calls partition insertion function:
  15. PyCon Ireland 2013 23 / 50 PostgreSQL CREATE FUNCTION "logs_delete_master"()

    RETURNS "trigger" AS $BODY$ BEGIN DELETE FROM ONLY logs WHERE id = NEW.id; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; Function to delete duplicate rows from master:
  16. PyCon Ireland 2013 24 / 50 PostgreSQL CREATE TRIGGER "after_insert_logs_trigger"

    AFTER INSERT ON "logs" FOR EACH ROW EXECUTE PROCEDURE "logs_delete_master"(); Trigger that calls delete duplicate rows function:
  17. PyCon Ireland 2013 25 / 50 Code for automatic new

    partition creation PostgreSQL DECLARE start_date TIMESTAMP; start_date := date_trunc('month', NEW.added); IF NOT EXISTS( SELECT relname FROM pg_class WHERE relname=tablename) THEN EXECUTE 'CREATE TABLE ' || tablename || ' ( CHECK ( added >= ''' || start_date || ''' AND added <= ''' || start_date + '1 month'::interval || ''' ) ) INHERITS ('logs');'; END IF;
  18. PyCon Ireland 2013 26 / 50 MySQL Methods: • Horizontal

    partitioning Strategies: • Range partitioning • List partitioning • Hash partitioning • Composite partitioning
  19. PyCon Ireland 2013 28 / 50 How that works MySQL

    CREATE TABLE members ( username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (2012), PARTITION p1 VALUES LESS THAN (2013), PARTITION p2 VALUES LESS THAN MAXVALUE );
  20. PyCon Ireland 2013 33 / 50 django-parting From pypi: $

    pip install django-parting or clone from github: $ git clone git://github.com/danfairs/django- parting.git
  21. PyCon Ireland 2013 34 / 50 django-parting Add parting to

    PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'parting' )
  22. PyCon Ireland 2013 35 / 50 django-parting from django.db import

    models from django.utils import timezone class Tweet(models.Model): json = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) class Star(models.Model): tweet = models.ForeignKey(Tweet) user = models.TextField()
  23. PyCon Ireland 2013 36 / 50 django-parting from django.utils import

    timezone from parting import PartitionManager from dateutil.relativedelta import relativedelta def _key_for_date(dt): return dt.strftime('%Y%m') class TweetPartitionManager(PartitionManager): def current_partition(self): return _key_for_date(timezone.now()) def next_partition(self): one_months_time = timezone.now() + relativedelta(months=1) return _key_for_date(one_months_time)
  24. PyCon Ireland 2013 37 / 50 django-parting class Tweet(models.Model): json

    = models.TextField() user = models.TextField() created_at = models.DateTimeField(default=timezone.now()) partitions = TweetPartitionManager() class Meta: abstract = True class Star(models.Model): tweet = models.PartitionForeignKey(Tweet) user = models.TextField() partitions = TweetPartitionManager() class Meta: abstract = True
  25. PyCon Ireland 2013 38 / 50 django-parting import json from

    django.utils.timezone import make_aware, utc tweet_data = { 'created_at': make_aware( datetime.datetime(2012, 12, 6, 14, 23), utc) 'json': json.dumps({'key': 'value'}), 'user': 'Jimmy' } partition_key = _key_for_dt(tweet_data['created_at']) partition = Tweet.partitions.get_partition(partition_key) tweet = partition(**tweet_data) tweet.save()
  26. PyCon Ireland 2013 39 / 50 django-parting Problems: • Not

    database-level partitioning • No django admin support • No active development
  27. PyCon Ireland 2013 41 / 50 django-db-parti Features: • Real

    database-level partitioning • Automatic new partition creation in real-time • Django admin support
  28. PyCon Ireland 2013 42 / 50 django-db-parti From pypi: $

    pip install django-db-parti or clone from github: $ git clone git://github.com/maxtepkeev/django-db- parti.git
  29. PyCon Ireland 2013 43 / 50 django-db-parti Add dbparti to

    PYTHONPATH and installed applications: INSTALLED_APPS = ( ... 'dbparti' )
  30. PyCon Ireland 2013 44 / 50 django-db-parti In models.py add

    import statement: from dbparti.models import Partitionable Make your model to inherit from Partitionable: class YourModelName(Partitionable):
  31. PyCon Ireland 2013 45 / 50 django-db-parti Add a Meta

    class to your model with a few settings: class Meta(Partitionable.Meta): partition_type = 'range' partition_subtype = 'date' partition_range = 'month' partition_column = 'added' Lastly initialize some database stuff with the command: $ python manage.py partition app_name
  32. PyCon Ireland 2013 46 / 50 django-db-parti Customize how data

    will be displayed in the Django admin In admin.py add import statement: from dbparti.admin import PartitionableAdmin Make your admin to inherit from PartitionableAdmin: class
  33. PyCon Ireland 2013 47 / 50 django-db-parti Possible model settings

    partition_type: • range partition_subtype: • date partition_range: • day • week • month • year
  34. PyCon Ireland 2013 48 / 50 django-db-parti Possible model admin

    settings partition_show: • all (default) • current • previous
  35. PyCon Ireland 2013 49 / 50 django-db-parti Problems: • Only

    range partitioning (datetime) • Database backend limitations