Slide 1

Slide 1 text

DJANGO WITH POSTGRESQL SUPERPOWERS PAOLO MELCHIORRE ~ @pauloxnet

Slide 2

Slide 2 text

Paolo Melchiorre ~ @pauloxnet 2 Django database backends “Django attempts to support as many features as possible on all database backends. However, not all database backends are alike …” — “Databases”, Django Documentation

Slide 3

Slide 3 text

Paolo Melchiorre ~ @pauloxnet Django 3 William Gottlieb (Public domain) Django Reinhardt at the Aquarium jazz club in New York, NY 1946

Slide 4

Slide 4 text

Paolo Melchiorre ~ @pauloxnet PostgreSQL 4 Photo by Nam Anh on Unsplash Elephant walking during daytime 2019

Slide 5

Slide 5 text

I.A.N.A.D.A. ¿! ? ¡? !

Slide 6

Slide 6 text

Paolo Melchiorre ~ @pauloxnet @pauloxnet • CTO @ 20tab • Software engineer • Python developer • Django contributor • PostgreSQL user Paolo Melchiorre 6 DjangoCon Europe 2019 - Bartek Pawlik (CC BY-NC-SA)

Slide 7

Slide 7 text

Paolo Melchiorre ~ @pauloxnet 7 Django no-code $ python3 --version Python 3.10.0 $ python3 -m venv ~/.mysite $ source ~/.mysite/bin/activate $ python3 -m pip install django~=4.0 $ python3 -m django --version 4.0.3 $ cd ~/projects $ python3 -m django startproject mysite

Slide 8

Slide 8 text

Paolo Melchiorre ~ @pauloxnet 8 https:/ /twitter.com/psycopg

Slide 9

Slide 9 text

Paolo Melchiorre ~ @pauloxnet 9 Psycopg psycopg (v0.01 ~2001) PostgreSQL driver, Python DB API 2.0, LGPL psycopg1 (v1.0 ~2001) psycopg2 (v2.0 ~2006) Python3 (v2.4 ~2011) JSON (v2.5 ~2013) psycopg3 (v3.0 ~2021)

Slide 10

Slide 10 text

Paolo Melchiorre ~ @pauloxnet 10 Installing PostgreSQL C client library $ sudo apt install libpq-dev ... Setting up libpq-dev (13.6-0) ...

Slide 11

Slide 11 text

Paolo Melchiorre ~ @pauloxnet 11 Installing psycopg2 $ python3 -m pip install psycopg2 ... Successfully installed psycopg2-2.9.3

Slide 12

Slide 12 text

Paolo Melchiorre ~ @pauloxnet 12 Activating PostgreSQL backend DATABASES = { "default": { "ENGINE": "django.db.backends.postgresql", "HOST": "127.0.0.1", "NAME": "mydatabase", "PASSWORD": "mypassword", "PORT": "5432", "USER": "mydatabaseuser", } }

Slide 13

Slide 13 text

Paolo Melchiorre ~ @pauloxnet 13 PostgreSQL Connection URIs # environment DATABASE_URL=postgres://user:password@host:port/name # python3 -m pip install dj-database-url import dj_database_url DATABASES = {"default": dj_database_url.config()} # python3 -m pip install django-configurations import configurations DATABASES = configurations.values.DatabaseURLValue()

Slide 14

Slide 14 text

Paolo Melchiorre ~ @pauloxnet Making queries 14 https:/ /docs.djangoproject.com/en/stable/topics/db/queries/

Slide 15

Slide 15 text

Paolo Melchiorre ~ @pauloxnet 15 Blog models from django.db import models class Author(models.Model): name = models.CharField(max_length=200) class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author")

Slide 16

Slide 16 text

Paolo Melchiorre ~ @pauloxnet 16 Making queries >>> from blog.models import Author >>> Author.objects.filter(name__contains="Terry") SELECT "name" FROM "blog_author" WHERE "name" LIKE '%Terry%'

Slide 17

Slide 17 text

Paolo Melchiorre ~ @pauloxnet Kickstarter 17 https:/ /www.kickstarter.com/projects/mjtamlyn/improved-postgresql-support-in-django

Slide 18

Slide 18 text

Paolo Melchiorre ~ @pauloxnet 18 PostgreSQL contrib django.contrib.postgres (v1.8 ~2015) Fields, indexes, functions, extensions, … Crowdfound Marc Tamlyn (~2014) JSONfield* (v1.9 ~2015) Full-text search (v1.10 ~2016) RandomUUID (v2.0 ~2017) Operator Classes (v3.2 ~2021)

Slide 19

Slide 19 text

Paolo Melchiorre ~ @pauloxnet 19 Activating PostgreSQL module INSTALLED_APPS = [ "django.contrib.admin", "django.contrib.auth", "django.contrib.contenttypes", "django.contrib.sessions", "django.contrib.messages", "django.contrib.staticfiles", "django.contrib.postgres", "blog", ]

Slide 20

Slide 20 text

Paolo Melchiorre ~ @pauloxnet DjangoCon Europe 2017 20 DjangoCon Europe Sprint 2017 Paolo Melchiorre (CC BY-SA)

Slide 21

Slide 21 text

Paolo Melchiorre ~ @pauloxnet • Version 4 (random) UUID • pgcrypto module • cryptographic functions • CryptoExtension migration • in-core from PG 13 RandomUUID 21

Slide 22

Slide 22 text

Paolo Melchiorre ~ @pauloxnet 22 UUID field import uuid from django.db import models class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") uuid = models.UUIDField(default=uuid.uuid4)

Slide 23

Slide 23 text

Paolo Melchiorre ~ @pauloxnet 23 RandomUUID update >>> from django.contrib.postgres import functions >>> from blog.models import Entry >>> Entry.objects.update(uuid=functions.RandomUUID()) UPDATE "blog_entry" SET "uuid" = GEN_RANDOM_UUID() -- ~1 million UUIDs in ~30 s

Slide 24

Slide 24 text

Paolo Melchiorre ~ @pauloxnet 24 EuroPython Sprints 2017 Paolo Melchiorre (CC BY-SA)

Slide 25

Slide 25 text

Paolo Melchiorre ~ @pauloxnet • Fields, expressions, functions • GIN index • GiST index • Phrase search • Web search Full-text search 25

Slide 26

Slide 26 text

Paolo Melchiorre ~ @pauloxnet 26 Search lookup >>> from blog.models import Entry >>> Entry.objects.filter(headline__search="tomato") SELECT * FROM "blog_entry" WHERE to_tsvector("headline") @@ plainto_tsquery('tomato')

Slide 27

Slide 27 text

Paolo Melchiorre ~ @pauloxnet 27 SearchVector from django.contrib.postgres import indexes, search from django.db import models s = search.SearchVector("headline", config="english") class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") class Meta: indexes = [indexes.GinIndex(s, name="s_idx")]

Slide 28

Slide 28 text

Paolo Melchiorre ~ @pauloxnet 28 SearchQuery >>> from django.contrib.postgres import search as fts >>> from blog.models import Entry, s >>> q = "ham OR tuna -pineapple" >>> doc = fts.SearchQuery(q, search_type='websearch') >>> Entry.objects.annotate(doc=s).filter(doc=doc) SELECT * FROM "blog_entry" WHERE to_tsvector('english', "headline") @@ websearch_to_tsquery('ham OR tuna -pineapple')

Slide 29

Slide 29 text

Paolo Melchiorre ~ @pauloxnet 29 Paolo Melchiorre (CC BY-SA) Trondheim (Norway) 2018

Slide 30

Slide 30 text

Paolo Melchiorre ~ @pauloxnet • PG array • Array of data • Base field • Size • Multi-dimensional Array Field 30

Slide 31

Slide 31 text

Paolo Melchiorre ~ @pauloxnet 31 Array Email Field from django.contrib.postgres.fields import ArrayField from django.db import models class Author(models.Model): name = models.CharField(max_length=200) emails = ArrayField(models.EmailField())

Slide 32

Slide 32 text

Paolo Melchiorre ~ @pauloxnet 32 Making array queries >>> from django.db.models import Q >>> from blog.models import Author >>> q = Q(emails__contains=["[email protected]"]) >>> Author.objects.filter(q).values_list("emails") [['[email protected]', '[email protected]']] SELECT "emails" FROM "blog_author" WHERE "emails" @> ARRAY['[email protected]']

Slide 33

Slide 33 text

Paolo Melchiorre ~ @pauloxnet 33 django-better-admin-arrayfield Before After

Slide 34

Slide 34 text

Paolo Melchiorre ~ @pauloxnet 34 San Francisco (California) 2013 Paolo Melchiorre (CC BY-SA)

Slide 35

Slide 35 text

Paolo Melchiorre ~ @pauloxnet • California Civic Data Coalition • PostgreSQL COPY support • COPY TO • COPY FROM • Temporary tables Django Postgres Copy 35

Slide 36

Slide 36 text

Paolo Melchiorre ~ @pauloxnet 36 Django Postgres Copy - Model from django.db import models from postgres_copy import CopyManager class Feature(models.Model): name = models.CharField(max_length=255) alternative_names = models.TextField() osm_type = models.CharField(max_length=8) osm_id = models.PositiveBigIntegerField() ... objects = CopyManager()

Slide 37

Slide 37 text

Paolo Melchiorre ~ @pauloxnet 37 Django Postgres Copy - CSV >>> import pathlib >>> pathlib.Path("/tmp/italy.csv").stat().st_size 217879987 (~208 MB) >>> from blog.models import Feature >>> Feature.objects.from_csv("/tmp/italy.csv") 918904 (~1 M) records loaded CPU times: user 354 ms, sys: 101 ms, total: 455 ms Wall time: 3.46 s

Slide 38

Slide 38 text

Paolo Melchiorre ~ @pauloxnet 38 Django Postgres Copy - SQL CREATE TEMPORARY TABLE "temp_blog_feature" ("..."); COPY "temp_blog_feature" FROM STDIN WITH CSV HEADER; -- COPY 918904 -- Time: 2233,077 ms (00:02,233) INSERT INTO "blog_feature" ("...") ( SELECT ... FROM "temp_blog_feature" ); DROP TABLE IF EXISTS "temp_blog_feature";

Slide 39

Slide 39 text

Paolo Melchiorre ~ @pauloxnet 39 Django Postgres Copy - GZip >>> import pathlib >>> pathlib.Path("/tmp/italy.csv.gz").stat().st_size 42876187 (~41 MB) >>> from geonames.models import Feature >>> import gzip >>> with gzip.open("/tmp/italy.csv.gz", "rb") as f: Feature.objects.from_csv(f) 918904 (~1M) records loaded CPU times: user 1.02 s, sys: 42.8 ms, total: 1.06 s Wall time: 3.45 s

Slide 40

Slide 40 text

Paolo Melchiorre ~ @pauloxnet Tree 40 Paolo Melchiorre (CC BY-SA) Abruzzo (Italy) 2020

Slide 41

Slide 41 text

Paolo Melchiorre ~ @pauloxnet • ltree module • Hierarchical tree-like data • Materialized path • Dedicated PathField • Abstract TreeModel Django ltree 41

Slide 42

Slide 42 text

Paolo Melchiorre ~ @pauloxnet 42 TreeModel from django.contrib.postgres import indexes as idx from django_ltree.models import TreeModel class Entry(TreeModel): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") class Meta: indexes = [idx.GistIndex(fields=['path'])]

Slide 43

Slide 43 text

Paolo Melchiorre ~ @pauloxnet 43 Tree - Example top pictures science astronomy astronomy astrophysics cosmology galaxies stars astronauts

Slide 44

Slide 44 text

Paolo Melchiorre ~ @pauloxnet 44 Tree - Queries >>> from blog.models import Entry >>> path = "top.science" >>> q = Entry.objects.filter(path__descendants=path) >>> q.order_by("path").values_list("path", flat=True) [top.science, top.science.astronomy, top.science.astronomy.astrophysics, top.science.astronomy.cosmology] SELECT "path" FROM "blog_entry" WHERE "path" <@ 'top.science' ORDER BY "path" ASC

Slide 45

Slide 45 text

Paolo Melchiorre ~ @pauloxnet 45 Appennines (Italy) 2020 Paolo Melchiorre (CC BY-SA)

Slide 46

Slide 46 text

Paolo Melchiorre ~ @pauloxnet • PostgreSQL extension • Best* GeoDjango backend • Spatial data types • Spatial indexing • Spatial functions PostGIS 46

Slide 47

Slide 47 text

Paolo Melchiorre ~ @pauloxnet 47 GeoDjango Compatibility Table ● Spatial Lookups (18/31) ● Database functions (15/32) ● Aggregate Functions (0/5) MariaDB 49% ● Spatial Lookups (18/31) ● Database functions (21/32) ● Aggregate Functions (2/5) Oracle 60% ● Spatial Lookups (31/31) ● Database functions (32/32) ● Aggregate Functions (5/5) PostGIS 100% ● Spatial Lookups (22/31) ● Database functions (30/32) ● Aggregate Functions (4/5) Spatialite 82% ● Spatial Lookups (18/31) ● Database functions (16/32) ● Aggregate Functions (0/5) MySQL 50%

Slide 48

Slide 48 text

Paolo Melchiorre ~ @pauloxnet 48 What’s next • Indexes • Aggregation functions • Trigram extension • Range fields • Case-insensitive text fields • ...

Slide 49

Slide 49 text

Paolo Melchiorre ~ @pauloxnet 49 Tips • docs in djangoproject.com • details in postgresql.org • source code in github.com • questions in stackoverflow.com

Slide 50

Slide 50 text

Paolo Melchiorre ~ @pauloxnet 50 License CC BY-SA 4.0 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Slide 51

Slide 51 text

Paolo Melchiorre ~ @pauloxnet @psycopg danielevarrazzo psycopg lists.postgresql.org psycopg.org 51

Slide 52

Slide 52 text

Paolo Melchiorre ~ @pauloxnet @20tab 20tab 20tab [email protected] 20tab.com 52

Slide 53

Slide 53 text

Paolo Melchiorre ~ @pauloxnet @pauloxnet paolomelchiorre pauloxnet [email protected] paulox.net 53