Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django with PostgreSQL superpowers

Django with PostgreSQL superpowers

-- A talk I gave at DjangoCon Europe 2021

Django per se is a database-agnostic web framework, but real-world projects based on it are not. PostgreSQL has the richest feature set of any supported databases and we’ll see how to use these superpowers in Django.

More info on my blog: https://www.paulox.net/2021/06/03/djangocon-europe-2021/

Paolo Melchiorre

June 03, 2021
Tweet

More Decks by Paolo Melchiorre

Other Decks in Technology

Transcript

  1. Paolo Melchiorre ~ @pauloxnet 2 Django database backends “Django attempts

    to support as many features as possible on all database backends. However, not all database backends are alike …” — “Databases”, Django Documentation
  2. Paolo Melchiorre ~ @pauloxnet @pauloxnet • CTO @ 20tab •

    Software engineer • Python developer • Django contributor • PostgreSQL user Paolo Melchiorre 6
  3. Paolo Melchiorre ~ @pauloxnet 7 Django no-code $ python3 --version

    Python 3.9.5 $ python3 -m venv ~/.virtualenvs/mysite $ . ~/.virtualenvs/mysite/bin/activate $ python3 -m pip install django $ python3 -m django --version 3.2.4 $ python3 manage.py startproject mysite
  4. Paolo Melchiorre ~ @pauloxnet 9 Psycopg psycopg (v0.01 ~2001) PostgreSQL

    driver, Python DB API 2.0, LGPL psycopg1 (v1.0 ~2001) psycopg2 (v2.0 ~2006) Python3 (v2.4 ~2011) JSON (v2.5 ~2013) psycopg3 (v3.0 ~2021?)
  5. Paolo Melchiorre ~ @pauloxnet 10 Installing PostgreSQL C client library

    $ sudo apt install libpq-dev … Setting up libpq-dev (13.3-1) ...
  6. Paolo Melchiorre ~ @pauloxnet 11 Installing psycopg2 $ python3 -m

    pip install psycopg2 … Successfully installed psycopg2-2.8.6
  7. Paolo Melchiorre ~ @pauloxnet 12 Activating PostgreSQL backend DATABASES =

    { "default": { "ENGINE": "django.db.backends.postgresql", "HOST": "127.0.0.1", "NAME": "mydatabase", "PASSWORD": "mypassword", "PORT": "5432", "USER": "mydatabaseuser", } }
  8. Paolo Melchiorre ~ @pauloxnet 13 PostgreSQL Connection URIs # environment

    DATABASE_URL=postgres://usr:pwd@host:5432/db # python3 -m pip install dj-database-url import dj_database_url DATABASES = {"default": dj_database_url.config()} # python3 -m pip install django-configurations import configurations DATABASES = configurations.values.DatabaseURLValue()
  9. Paolo Melchiorre ~ @pauloxnet 15 Blog models from django.db import

    models class Author(models.Model): name = models.CharField(max_length=200) class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author")
  10. Paolo Melchiorre ~ @pauloxnet 16 Making queries >>> from blog.models

    import Author >>> Author.objects.filter(name__contains="Terry") SELECT "name" FROM "blog_author" WHERE "name" LIKE '%Terry%'
  11. Paolo Melchiorre ~ @pauloxnet 18 PostgreSQL contrib django.contrib.postgres (v1.8 ~2015)

    Fields, indexes, functions, extensions, … Crowdfound Marc Tamlyn (~2014) JSONfield* (v1.9 ~2015) Full-text search (v1.10 ~2016) RandomUUID (v2.0 ~2017) Operator Classes (v3.2 ~2021)
  12. Paolo Melchiorre ~ @pauloxnet 19 Activating PostgreSQL module INSTALLED_APPS =

    [ "django.contrib.admin", "django.contrib.auth", "django.contrib.contenttypes", "django.contrib.sessions", "django.contrib.messages", "django.contrib.staticfiles", "django.contrib.postgres", "blog", ]
  13. Paolo Melchiorre ~ @pauloxnet • Version 4 (random) UUID •

    pgcrypto module • cryptographic functions • CryptoExtension migration • in-core from PG 13 RandomUUID 21
  14. Paolo Melchiorre ~ @pauloxnet 22 UUID field import uuid from

    django.db import models class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") uuid = models.UUIDField(default=uuid.uuid4)
  15. Paolo Melchiorre ~ @pauloxnet 23 RandomUUID update >>> from django.contrib.postgres

    import functions >>> from blog.models import Entry >>> Entry.objects.update(uuid=functions.RandomUUID()) UPDATE "blog_entry" SET "uuid" = GEN_RANDOM_UUID() -- ~1 million UUIDs in ~30 s
  16. Paolo Melchiorre ~ @pauloxnet • Fields, expressions, functions • GIN

    index • GiST index • Phrase search • Web search Full-text search 25
  17. Paolo Melchiorre ~ @pauloxnet 26 Search lookup >>> from blog.models

    import Entry >>> Entry.objects.filter(headline__search="tomato") SELECT * FROM "blog_entry" WHERE to_tsvector("headline") @@ plainto_tsquery('tomato')
  18. Paolo Melchiorre ~ @pauloxnet 27 SearchVector from django.contrib.postgres import indexes,

    search from django.db import models s = search.SearchVector("headline", config="english") class Entry(models.Model): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") class Meta: indexes = [indexes.GinIndex(s, name="s_idx")]
  19. Paolo Melchiorre ~ @pauloxnet 28 SearchQuery >>> from django.contrib.postgres import

    search as fts >>> from blog.models import Entry, s >>> q = "ham OR tuna -pineapple" >>> doc = fts.SearchQuery(q, search_type='websearch') >>> Entry.objects.annotate(doc=s).filter(doc=doc) SELECT * FROM "blog_entry" WHERE to_tsvector('english', "headline") @@ websearch_to_tsquery('ham OR tuna -pineapple')
  20. Paolo Melchiorre ~ @pauloxnet • PG array • Array of

    data • Base field • Size • Multi-dimensional Array Field 30
  21. Paolo Melchiorre ~ @pauloxnet 31 Array Email Field from django.contrib.postgres.fields

    import ArrayField from django.db import models class Author(models.Model): name = models.CharField(max_length=200) emails = ArrayField(models.EmailField())
  22. Paolo Melchiorre ~ @pauloxnet 32 Making array queries >>> from

    django.db.models import Q >>> from blog.models import Author >>> q = Q(emails__contains=["[email protected]"]) >>> Author.objects.filter(q).values_list("emails") [['[email protected]', '[email protected]']] SELECT "emails" FROM "blog_author" WHERE "emails" @> ARRAY['[email protected]']
  23. Paolo Melchiorre ~ @pauloxnet • California Civic Data Coalition •

    PostgreSQL COPY support • COPY TO • COPY FROM • Temporary tables Dajngo Postgres Copy 35
  24. Paolo Melchiorre ~ @pauloxnet 36 Django Postgres Copy - Model

    from django.db import models from postgres_copy import CopyManager class Feature(models.Model): name = models.CharField(max_length=255) alternative_names = models.TextField() osm_type = models.CharField(max_length=8) osm_id = models.PositiveBigIntegerField() … objects = CopyManager()
  25. Paolo Melchiorre ~ @pauloxnet 37 Django Postgres Copy - CSV

    >>> import pathlib >>> pathlib.Path("/tmp/italy.csv").stat().st_size 217879987 (~208 MB) >>> from blog.models import Feature >>> Feature.objects.from_csv("/tmp/italy.csv") 918904 (~1 M) records loaded CPU times: user 354 ms, sys: 101 ms, total: 455 ms Wall time: 3.46 s
  26. Paolo Melchiorre ~ @pauloxnet 38 Django Postgres Copy - SQL

    CREATE TEMPORARY TABLE "temp_blog_feature" ("…"); COPY "temp_blog_feature" FROM STDIN WITH CSV HEADER; -- COPY 918904 -- Time: 2233,077 ms (00:02,233) INSERT INTO "blog_feature" ("…") ( SELECT … FROM "temp_blog_feature" ); DROP TABLE IF EXISTS "temp_blog_feature";
  27. Paolo Melchiorre ~ @pauloxnet 39 Django Postgres Copy - GZip

    >>> import pathlib >>> pathlib.Path("/tmp/italy.csv.gz").stat().st_size 42876187 (~41 MB) >>> from geonames.models import Feature >>> import gzip >>> with gzip.open("/tmp/italy.csv.gz", "rb") as f: Feature.objects.from_csv(f) 918904 (~1M) records loaded CPU times: user 1.02 s, sys: 42.8 ms, total: 1.06 s Wall time: 3.45 s
  28. Paolo Melchiorre ~ @pauloxnet • ltree module • Hierarchical tree-like

    data • Materialized path • Dedicated PathField • Abstract TreeModel Django ltree 41
  29. Paolo Melchiorre ~ @pauloxnet 42 TreeModel from django.contrib.postgres import indexes

    as idx from django_ltree.models import TreeModel class Entry(TreeModel): headline = models.CharField(max_length=255) body_text = models.TextField() authors = models.ManyToManyField("Author") class Meta: indexes = [idx.GistIndex(fields=['path'])]
  30. Paolo Melchiorre ~ @pauloxnet 43 Tree - Example top pictures

    science astronomy astronomy astrophysics cosmology galaxies stars astronauts
  31. Paolo Melchiorre ~ @pauloxnet 44 Tree - Queries >>> from

    blog.models import Entry >>> path = "top.science" >>> q = Entry.objects.filter(path__descendants=path) >>> q.order_by("path").values_list("path", flat=True) [top.science, top.science.astronomy, top.science.astronomy.astrophysics, top.science.astronomy.cosmology] SELECT "path" FROM "blog_entry" WHERE "path" <@ 'top.science' ORDER BY "path" ASC
  32. Paolo Melchiorre ~ @pauloxnet • PostgreSQL extension • Best* GeoDjango

    backend • Spatial data types • Spatial indexing • Spatial functions PostGIS 46
  33. Paolo Melchiorre ~ @pauloxnet 47 GeoDjango Compatibility Table • Spatial

    Lookups (18/31) • Database functions (15/32) • Aggregate Functions (0/5) MariaDB 49% • Spatial Lookups (18/31) • Database functions (21/32) • Aggregate Functions (2/5) Oracle 60% • Spatial Lookups (31/31) • Database functions (32/32) • Aggregate Functions (5/5) PostGIS 100% • Spatial Lookups (22/31) • Database functions (30/32) • Aggregate Functions (4/5) Spatialite 82% • Spatial Lookups (18/31) • Database functions (16/32) • Aggregate Functions (0/5) MySQL 50%
  34. Paolo Melchiorre ~ @pauloxnet 48 What’s next • Indexes •

    Aggregation functions • Trigram extension • Range fields • Case-insensitive text fields • ...
  35. Paolo Melchiorre ~ @pauloxnet 49 Tips • docs in djangoproject.com

    • details in postgresql.org • source code in github.com • questions in stackoverflow.com
  36. Paolo Melchiorre ~ @pauloxnet 50 License CC BY-SA 4.0 This

    work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.