Slide 1

Slide 1 text

Representing Hierarchies in Relational Databases

Slide 2

Slide 2 text

Hi, I’m Jacob ➔ Python user since 1997 ➔ Core maintainer of ◆ django-SHOP ◆ django-angular ◆ djangocms-cascade ◆ django-websocket-redis ◆ django-admin-sortable2 ◆ django-sass-processor @jacobrief https://github.com/jrief +JacobRief Contributor to ● django-CMS ● Django-filer ● django-treebeard

Slide 3

Slide 3 text

In the real world, we’re used to organize our commodities in categories. In our virtual world we do exactly the same.

Slide 4

Slide 4 text

Store Food Gadgets Vegetal Meat Garden House Car Pork Beef Poultry Fruit Apples Oranges Angus Galloway Fowl Kitchen Knifes Dishes Bowls Vessels Plates Glasses Pans Chicken Goose Turkey Categories in a Store

Slide 5

Slide 5 text

Other use cases for trees in databases ➔ Pages in a Content Management System ➔ Representing an abstraction of the DOM ➔ Organization charts in medium to large companies ➔ Discussion threads in blogs ➔ Emulating directory folders

Slide 6

Slide 6 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Adjacent List from django.db import models from django.db.models import fields class Node(models.Model): parent = models.ForeignKey( "self", on_delete=models.CASCADE, ) some_payload = ... CREATE TABLE "node" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "parent_id" integer NULL REFERENCES "node" ("id"), "some_payload" varchar(1000) );

Slide 7

Slide 7 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node) results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4))

Slide 8

Slide 8 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node) results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4))

Slide 9

Slide 9 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Descending an AL tree SELECT n1.id, n2.id, n3.id, n4.id FROM node AS n1 LEFT JOIN node AS n2 ON n2.parent_id = n1.id LEFT JOIN node AS n3 ON n3.parent_id = n2.id LEFT JOIN node AS n4 ON n4.parent_id = n3.id WHERE n1.id = 4; 4 | 9 | | 4 | 10 | 14 | 4 | 10 | 15 | 4 | 11 | 16 | 23 4 | 11 | 16 | 24 4 | 11 | 16 | 25

Slide 10

Slide 10 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Ascending an AL tree from itertools import chain from typing import Iterable def get_ancestors(node: Node) -> Iterable[Node]: if node.parent: return chain([node.parent], get_ancestors(node.parent)) return chain() list(get_ancestors(Node.objects.get(pk=16))

Slide 11

Slide 11 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Ascending an AL tree from itertools import chain from typing import Iterable def get_ancestors(node: Node) -> Iterable[Node]: if node.parent: return chain([node.parent], get_ancestors(node.parent)) return chain() list(get_ancestors(Node.objects.get(pk=16))

Slide 12

Slide 12 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Attribute filtering from itertools import chain from typing import Iterable def get_descendants(node: Node) -> Iterable[Node]: queryset = Node.objects.filter(parent=node, color="red") results = chain(queryset) for child in queryset: results = chain(results, get_descendants(child)) return results list(get_descendants(Node.objects.get(pk=4)) ?

Slide 13

Slide 13 text

Materialized Path Tree

Slide 14

Slide 14 text

A AA AB AAB AAA ABA ABB ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Materialized Path Tree

Slide 15

Slide 15 text

from django.db import models from django.db.models import fields class Node(models.Model): steplen = 4 # in this presentation: 1 alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' path = fields.CharField( "Path", max_length=1000, db_index=True, ) some_payload = ... Materialized Path Tree as Django Model

Slide 16

Slide 16 text

A AA AB AAB AAA 6 ABA 7 ABB 8 ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4)) Descending a MP tree

Slide 17

Slide 17 text

A AA AB AAB AAA 6 ABA 7 ABB 8 ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA ABBAA ABBAB ABBAC ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Descending a MP tree SELECT * FROM node WHERE path LIKE 'AAA%' AND NOT id=4; def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4))

Slide 18

Slide 18 text

A AA AB AAB AAA 6 ABA 7 ABB 8 ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Descending a MP tree from django.db.models.functions import Length def get_descendants(node: Node) -> QuerySet: qs = Node.objects.filter(path__startswith=node.path) qs = qs.exclude(path=node.path) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('depth') return qs list(get_descendants(Node.objects.get(pk=4))

Slide 19

Slide 19 text

A AA AB AAB AAA 6 ABA 7 ABB 8 ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Ascending a MP tree from django.db.models.functions import Length def get_ancestors(node: Node) -> QuerySet: paths = [node.path[0:k] for k in range(1, len(node.path))] qs = Node.objects.filter(path__in=paths) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('-depth') return qs list(get_ancestors(Node.objects.get(pk=16))

Slide 20

Slide 20 text

A AA AB AAB AAA 6 ABA 7 ABB 8 ABC AAAA AAAB AAAC AABA AABAA AABAB AAABA AAABB AAACA 13 ABBA 19 ABBAA 20 ABBAB 21 ABBAC 22 ABBAD ABBABA ABBABB ABBADA AAACAA AAACAB AAACAC Ascending a MP tree from django.db.models.functions import Length def get_ancestors(node: Node) -> QuerySet: paths = [node.path[0:k] for k in range(1, len(node.path))] qs = Node.objects.filter(path__in=paths) qs = qs.annotate(depth=Length('path')) qs = qs.order_by('-depth') return qs list(get_ancestors(Node.objects.get(pk=16))

Slide 21

Slide 21 text

Nested Sets Trees

Slide 22

Slide 22 text

1 56 2 31 32 55 23 30 3 22 33 34 35 52 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Nested Sets Tree

Slide 23

Slide 23 text

from django.db import models from django.db.models import fields class Node(models.Model): left = fields.PositiveIntegerField( db_index=True, ) right = fields.PositiveIntegerField( db_index=True, ) payload_field = ... Nested Sets Tree as Django Model

Slide 24

Slide 24 text

1 56 2 31 32 55 23 30 3 22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Descending a NS tree def get_descendants(node: Node) -> QuerySet: between = node.left, node.right - 1 qs = Node.objects.filter(left__range=between) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4))

Slide 25

Slide 25 text

1 56 2 31 32 55 23 30 3 22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 def get_descendants(node: Node) -> QuerySet: between = node.left, node.right - 1 qs = Node.objects.filter(left__range=between) qs = qs.exclude(path=node.path) return qs list(get_descendants(Node.objects.get(pk=4)) Descending a NS tree SELECT * FROM node WHERE `left` BETWEEN 3 AND 21 AND NOT id=4;

Slide 26

Slide 26 text

1 56 2 31 32 55 23 30 3 22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 37 38 39 44 45 46 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Ascending a NS tree def get_ancestors(node: Node) -> QuerySet: qs = Node.objects.filter(left__lt=node.left, right__gt=node.right) return qs list(get_ancestors(Node.objects.get(pk=16))

Slide 27

Slide 27 text

1 56 2 31 32 55 23 30 3 22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 27 28 7 8 9 10 13 20 13 36 51 19 37 38 20 39 44 21 45 46 22 47 50 40 41 42 43 48 49 14 15 16 17 18 19 Ascending a NS Tree SELECT * FROM node WHERE `left` < 13 AND `right` > 20; def get_ancestors(node: Node) -> QuerySet: qs = Node.objects.filter(left__lt=node.left, right__gt=node.right) return qs list(get_ancestors(Node.objects.get(pk=16))

Slide 28

Slide 28 text

1 56 2 31 3 32 55 23 30 3 22 6 33 34 7 35 52 8 53 54 4 5 6 11 12 21 24 29 25 26 18 27 28 7 8 9 10 13 20 13 36 51 19 37 38 20 39 44 21 45 46 22 47 50 26 40 41 27 42 43 28 48 49 14 15 16 17 18 19 Add node to a NS tree ? ? from django.db.models.fields import PositiveIntegerField from django.db.models.expressions import F, Case, When def add_child(parent: Node, child: Node): Node.objects.filter(right__gte=parent.right).update( left=Case( When(left__gte=node.left, then=F('left') + 2), default=F('left'), output_field=PositiveIntegerField(), ), right=F('right') + 2, ) child.update(left=parent.left + 1, right=parent.right - 1) parent = Node.objects.get(pk=14) new_child = Node.objects.create(...) # -> pk=100 add_child(parent, new_child)

Slide 29

Slide 29 text

1 58 2 33 3 34 57 25 32 3 24 6 35 36 7 37 54 8 55 56 4 5 6 13 14 23 26 31 27 28 18 29 30 7 10 11 12 15 22 13 38 53 19 39 40 20 41 46 21 47 48 22 49 52 26 42 43 27 44 45 28 50 51 16 17 18 19 20 21 Add node to NS tree ? ? from django.db.models.fields import PositiveIntegerField from django.db.models.expressions import F, Case, When def add_child(parent: Node, child: Node): Node.objects.filter(right__gte=parent.right).update( left=Case( When(left__gte=node.left, then=F('left') + 2), default=F('left'), output_field=PositiveIntegerField(), ), right=F('right') + 2, ) child.update(left=parent.left + 1, right=parent.right - 1) parent = Node.objects.get(pk=14) new_child = Node.objects.create(...) # -> pk=100 add_child(parent, new_child) 8 9

Slide 30

Slide 30 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 AL tree with Common Table Expressions WITH RECURSIVE tree AS ( SELECT * FROM node WHERE id = 4 UNION ALL SELECT node.* FROM node, tree WHERE node.parent_id = tree.id ) SELECT * FROM tree WHERE NOT id = 4; Common Table Expressions are used to simplify complex joins and subqueries, and to provide a means to query hierarchical data.

Slide 31

Slide 31 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Common Table Expressions (CTE) WITH RECURSIVE tree AS ( SELECT * FROM node WHERE id = 4 UNION ALL SELECT node.* FROM node, tree WHERE node.parent_id = tree.id ) SELECT * FROM tree WHERE NOT id = 4; Don’t put a semicolon here! node.id = tree.parent_id

Slide 32

Slide 32 text

1 2 3 5 4 6 7 8 9 10 11 12 17 18 14 15 16 13 19 20 21 22 26 27 28 23 24 25 Common Table Expressions using django-CTE from django_cte import With cte = With.recursive( lambda cte: Node.objects.filter(id=4).union( cte.join(Node, parent_id=cte.col.id), all=True, ) ) nodes = cte.join(Node, parent_id=cte.col.id).with_cte(cte)

Slide 33

Slide 33 text

Comparison of SQL Design Patterns ● Traversing the tree ● Multiple roots ● Insert / Delete single node ● Move subtree ● Sibling ordering ● Depth information ● Size limitations ● Extra storage requirements ● Tree integrity Adjacent List w/o CTE Materialized Path Tree Nested Sets Tree = requires extra field and/or extra computation = possible by default Adjacent List with CTE

Slide 34

Slide 34 text

Comparison of SQL Design Patterns ● Traversing the tree ● Multiple roots ● Insert / Delete single node ● Move subtree ● Sibling ordering ● Depth information ● Size limitations ● Extra storage requirements ● Tree integrity Adjacent List w/o CTE Materialized Path Tree Nested Sets Tree = requires extra field and/or extra computation = possible by default Adjacent List with CTE

Slide 35

Slide 35 text

Django Apps implementing trees ● django-mptt 1.7k ⭐ on GitHub (Modified Preorder Tree Traversal: Nested Sets) ● django-treebeard 455 ⭐ on GitHub (implements Adjacent List without CTE, Materialized Path and Nested Sets) ● django-tree 47 ⭐ on GitHub (implements Adjacent List with CTE, but Postgres only)

Slide 36

Slide 36 text

Django Apps implementing trees ● django-mptt 1.7k ⭐ on GitHub (Modified Preorder Tree Traversal: Nested Sets) ● django-treebeard 455 ⭐ on GitHub (implements Adjacent List without CTE, Materialized Path and Nested Sets) ● django-tree 47 ⭐ on GitHub (implements Adjacent List with CTE, but Postgres only)

Slide 37

Slide 37 text

Thanks and further plans ● Add CTE support to Django (see #28919) ● Create/adopt a tree library based on Adjacent Lists with CTE ● Questions? @jacobrief https://github.com/jrief +JacobRief