Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jacob Rief - Representing Hierarchies in Relati...

Jacob Rief - Representing Hierarchies in Relational Databases

In this talk, I’ll explain the fundamental problem representing deep hierarchies in relational databases. To address this problem, we can use a database design pattern, named Materialized Path Trees.

Many data structures require a representation, where one parent node can have any arbitrary number of children. Inside relational databases, this typically is represented by a foreign key onto its own table. In Django’s ORM, we use models.ForeignKey('self', ...), to create this kind of recursive relationship. The major problem with this kind of representation is, that it doesn’t scale for deep trees. Whenever we have to traverse the tree from a given starting node, our code has to perform one database query per hierarchy level. To circumvent this, some database vendors implemented SQL dialects, to fetch a whole subtree with one query. Long time ago, Oracle for instance implemented 'CONNECT BY', which is proprietary and not part of the SQL standard. Nowadays, newer releases of most major database vendors implemented the 'WITH RECURSIVE' clause, which has been added to the SQL-99 standard. This allows us to build recursive queries.

Fortunately there is a clever recipe to represent hierarchies in relational databases using standard SQL techniques, but without the mentioned scaling problem: Materialized Path Trees, discovered by Vadim Tropashko. Django’s ecosystem offers two libraries, which implement this design pattern: django-mptt and django-treebeard. I also would like to mention django-tree, which only works on Postgres, using their SQL extension mentioned before. In this talk I’ll explain the design patterns for Materialized Path Trees. Furthermore I’ll show the pros and cons of both libraries.

PyConWeb

July 11, 2018
Tweet

More Decks by PyConWeb

Other Decks in Technology

Transcript

  1. Hi, I’m Jacob ➔ Python user since 1997 ➔ Core

    maintainer of ◆ django-SHOP ◆ django-angular ◆ djangocms-cascade ◆ django-websocket-redis ◆ django-admin-sortable2 ◆ django-sass-processor @jacobrief https://github.com/jrief +JacobRief Contributor to • django-CMS • Django-filer • django-treebeard
  2. In the real world, we’re used to organize our commodities

    in categories. In our virtual world we do exactly the same.
  3. Store Food Gadgets Vegetal Meat Garden House Car Pork Beef

    Poultry Fruit Apples Oranges Angus Galloway Fowl Kitchen Knifes Dishes Bowls Vessels Plates Glasses Pans Chicken Goose Turkey Categories in a Store
  4. Other use cases for trees in databases ➔ Pages in

    a Content Management System ➔ Representing an abstraction of the DOM ➔ Organization charts in medium to large companies ➔ Discussion threads in blogs ➔ Emulating directory folders
  5. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Adjacent List from django.db import models from django.db.models import fields class Node(models.Model): parent = models.ForeignKey( "self", on_delete=models.CASCADE, ) some_payload = ... CREATE TABLE "node" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "parent_id" integer NULL REFERENCES "node" ("id"), "some_payload" varchar(1000) );
  6. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node) results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4))
  7. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node) results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4))
  8. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree SELECT n1.id, n2.id, n3.id, n4.id FROM node AS n1 LEFT JOIN node AS n2 ON n2.parent_id = n1.id LEFT JOIN node AS n3 ON n3.parent_id = n2.id LEFT JOIN node AS n4 ON n4.parent_id = n3.id WHERE n1.id = 4; 4 | 9 | | 4 | 10 | 14 | 4 | 10 | 15 | 4 | 11 | 16 | 23 4 | 11 | 16 | 24 4 | 11 | 16 | 25
  9. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Ascending an AL tree from itertools import chain from typing import Iterable def get_ancestors(node: Node) -> Iterable[Node]: if node.parent: return chain([node.parent], get_ancestors(node.parent)) return chain() list(get_ancestors(Node.objects.get(pk=16))
  10. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Ascending an AL tree from itertools import chain from typing import Iterable def get_ancestors(node: Node) -> Iterable[Node]: if node.parent: return chain([node.parent], get_ancestors(node.parent)) return chain() list(get_ancestors(Node.objects.get(pk=16))
  11. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Attribute filtering from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node, color="red") results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4)) ?
  12. A AA AB AAB AAA ABA ABB ABC AAAA AAAB

    AAAC AABA AABAA AABAB AAABA AAABB AAACA ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Materialized Path Tree
  13. from django.db import models from django.db.models import fields class Node(models.Model):

    steplen = 4 # in this presentation: 1 alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' path = fields.CharField( "Path", max_length=1000, db_index=True, ) some_payload = ... Materialized Path Tree as Django Model
  14. A AA AB AAB AAA 6 ABA 7 ABB 8

    ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4)) Descending a MP tree
  15. A AA AB AAB AAA 6 ABA 7 ABB 8

    ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Descending a MP tree SELECT * FROM node WHERE path LIKE 'AAA%' AND NOT id=4; def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4))
  16. A AA AB AAB AAA 6 ABA 7 ABB 8

    ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Descending a MP tree from django.db.models.functions import Length def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('depth') return qs list(get_descendants(Node.objects.get(pk=4))
  17. A AA AB AAB AAA 6 ABA 7 ABB 8

    ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Ascending a MP tree from django.db.models.functions import Length def get_ancestors(node: Node) -> QuerySet: paths = [node.path[0:k] for k in range(1, len(node.path))] qs = Node.objects.filter(path__in=paths) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('-depth') return qs list(get_ancestors(Node.objects.get(pk=16))
  18. A AA AB AAB AAA 6 ABA 7 ABB 8

    ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Ascending a MP tree from django.db.models.functions import Length def get_ancestors(node: Node) -> QuerySet: paths = [node.path[0:k] for k in range(1, len(node.path))] qs = Node.objects.filter(path__in=paths) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('-depth') return qs list(get_ancestors(Node.objects.get(pk=16))
  19. 1 56 2 31 32 55 23 30 3 22

    33 34 35 52 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Nested Sets Tree
  20. from django.db import models from django.db.models import fields class Node(models.Model):

    left = fields.PositiveIntegerField( db_index=True, ) right = fields.PositiveIntegerField( db_index=True, ) payload_field = ... Nested Sets Tree as Django Model
  21. 1 56 2 31 32 55 23 30 3 22

    6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Descending a NS tree def get_descendants(node: Node) -> QuerySet: between = node.left, node.right - 1 qs = Node.objects.filter(left__range=between) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4))
  22. 1 56 2 31 32 55 23 30 3 22

    6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 def get_descendants(node: Node) -> QuerySet: between = node.left, node.right - 1 qs = Node.objects.filter(left__range=between) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4)) Descending a NS tree SELECT * FROM node WHERE `left` BETWEEN 3 AND 21 AND NOT id=4;
  23. 1 56 2 31 32 55 23 30 3 22

    6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Ascending a NS tree def get_ancestors(node: Node) -> QuerySet: qs = Node.objects.filter(left__lt=node.left, right__gt=node.right) return qs list(get_ancestors(Node.objects.get(pk=16))
  24. 1 56 2 31 32 55 23 30 3 22

    6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 19 37 38 20 39 44 21 45 46 22 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Ascending a NS Tree SELECT * FROM node WHERE `left` < 13 AND `right` > 20; def get_ancestors(node: Node) -> QuerySet: qs = Node.objects.filter(left__lt=node.left, right__gt=node.right) return qs list(get_ancestors(Node.objects.get(pk=16))
  25. 1 56 2 31 3 32 55 23 30 3

    22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 18 27 28 7 8 9 10 13 20 13 36 51 19 37 38 20 39 44 21 45 46 22 47 50 26 40 41 27 42 43 28 48 49 14 15 16 17 18 19 Add node to a NS tree ? ? from django.db.models.fields import PositiveIntegerField from django.db.models.expressions import F, Case, When def add_child(parent: Node, child: Node): Node.objects.filter(right__gte=parent.right).update( left=Case( When(left__gte=node.left, then=F('left') + 2), default=F('left'), output_field=PositiveIntegerField(), ), right=F('right') + 2, ) child.update(left=parent.left + 1, right=parent.right - 1) parent = Node.objects.get(pk=14) new_child = Node.objects.create(...) # -> pk=100 add_child(parent, new_child)
  26. 1 58 2 33 3 34 57 25 32 3

    24 6 35 36 7 37 54 8 55 56 4 5 6 13 14 23 26 31 27 28 18 29 30 7 10 11 12 15 22 13 38 53 19 39 40 20 41 46 21 47 48 22 49 52 26 42 43 27 44 45 28 50 51 16 17 18 19 20 21 Add node to NS tree ? ? from django.db.models.fields import PositiveIntegerField from django.db.models.expressions import F, Case, When def add_child(parent: Node, child: Node): Node.objects.filter(right__gte=parent.right).update( left=Case( When(left__gte=node.left, then=F('left') + 2), default=F('left'), output_field=PositiveIntegerField(), ), right=F('right') + 2, ) child.update(left=parent.left + 1, right=parent.right - 1) parent = Node.objects.get(pk=14) new_child = Node.objects.create(...) # -> pk=100 add_child(parent, new_child) 8 9
  27. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 AL tree with Common Table Expressions WITH RECURSIVE tree AS ( SELECT * FROM node WHERE id = 4 UNION ALL SELECT node.* FROM node, tree WHERE node.parent_id = tree.id ) SELECT * FROM tree WHERE NOT id = 4; Common Table Expressions are used to simplify complex joins and subqueries, and to provide a means to query hierarchical data.
  28. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Common Table Expressions (CTE) WITH RECURSIVE tree AS ( SELECT * FROM node WHERE id = 4 UNION ALL SELECT node.* FROM node, tree WHERE node.parent_id = tree.id ) SELECT * FROM tree WHERE NOT id = 4; Don’t put a semicolon here! node.id = tree.parent_id
  29. 1 2 3 5 4 6 7 8 9 10

    11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Common Table Expressions using django-CTE from django_cte import With cte = With.recursive( lambda cte: Node.objects.filter(id=4).union( cte.join(Node, parent_id=cte.col.id), all=True, ) ) nodes = cte.join(Node, parent_id=cte.col.id).with_cte(cte)
  30. Comparison of SQL Design Patterns • Traversing the tree •

    Multiple roots • Insert / Delete single node • Move subtree • Sibling ordering • Depth information • Size limitations • Extra storage requirements • Tree integrity Adjacent List w/o CTE Materialized Path Tree Nested Sets Tree = requires extra field and/or extra computation = possible by default Adjacent List with CTE
  31. Comparison of SQL Design Patterns • Traversing the tree •

    Multiple roots • Insert / Delete single node • Move subtree • Sibling ordering • Depth information • Size limitations • Extra storage requirements • Tree integrity Adjacent List w/o CTE Materialized Path Tree Nested Sets Tree = requires extra field and/or extra computation = possible by default Adjacent List with CTE
  32. Django Apps implementing trees • django-mptt 1.7k ⭐ on GitHub

    (Modified Preorder Tree Traversal: Nested Sets) • django-treebeard 455 ⭐ on GitHub (implements Adjacent List without CTE, Materialized Path and Nested Sets) • django-tree 47 ⭐ on GitHub (implements Adjacent List with CTE, but Postgres only)
  33. Django Apps implementing trees • django-mptt 1.7k ⭐ on GitHub

    (Modified Preorder Tree Traversal: Nested Sets) • django-treebeard 455 ⭐ on GitHub (implements Adjacent List without CTE, Materialized Path and Nested Sets) • django-tree 47 ⭐ on GitHub (implements Adjacent List with CTE, but Postgres only)
  34. Thanks and further plans • Add CTE support to Django

    (see #28919) • Create/adopt a tree library based on Adjacent Lists with CTE • Questions? @jacobrief https://github.com/jrief +JacobRief