Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django with PostgreSQL superpowers

Django with PostgreSQL superpowers

-- A talk I gave at DjangoCon Europe 2021

Django per se is a database-agnostic web framework, but real-world projects based on it are not. PostgreSQL has the richest feature set of any supported databases and we’ll see how to use these superpowers in Django.

More info on my blog: https://www.paulox.net/2021/06/03/djangocon-europe-2021/

Paolo Melchiorre

June 03, 2021
Tweet

More Decks by Paolo Melchiorre

Other Decks in Technology

Transcript

  1. DJANGO WITH POSTGRESQL
    SUPERPOWERS
    PAOLO MELCHIORRE ~ @pauloxnet

    View Slide

  2. Paolo Melchiorre ~ @pauloxnet
    2
    Django database backends
    “Django attempts to support
    as many features as possible
    on all database backends.
    However,
    not all database backends are alike …”
    — “Databases”, Django Documentation

    View Slide

  3. Paolo Melchiorre ~ @pauloxnet
    Django
    3

    View Slide

  4. Paolo Melchiorre ~ @pauloxnet
    PostgreSQL
    4

    View Slide

  5. IANADA
    ¿!
    ? ¡?
    !

    View Slide

  6. Paolo Melchiorre ~ @pauloxnet
    @pauloxnet
    • CTO @ 20tab
    • Software engineer
    • Python developer
    • Django contributor
    • PostgreSQL user
    Paolo Melchiorre
    6

    View Slide

  7. Paolo Melchiorre ~ @pauloxnet
    7
    Django no-code
    $ python3 --version
    Python 3.9.5
    $ python3 -m venv ~/.virtualenvs/mysite
    $ . ~/.virtualenvs/mysite/bin/activate
    $ python3 -m pip install django
    $ python3 -m django --version
    3.2.4
    $ python3 manage.py startproject mysite

    View Slide

  8. Paolo Melchiorre ~ @pauloxnet
    8

    View Slide

  9. Paolo Melchiorre ~ @pauloxnet
    9
    Psycopg
    psycopg (v0.01 ~2001)
    PostgreSQL driver, Python DB API 2.0, LGPL
    psycopg1 (v1.0 ~2001)
    psycopg2 (v2.0 ~2006)
    Python3 (v2.4 ~2011)
    JSON (v2.5 ~2013)
    psycopg3 (v3.0 ~2021?)

    View Slide

  10. Paolo Melchiorre ~ @pauloxnet
    10
    Installing PostgreSQL C client library
    $ sudo apt install libpq-dev

    Setting up libpq-dev (13.3-1) ...

    View Slide

  11. Paolo Melchiorre ~ @pauloxnet
    11
    Installing psycopg2
    $ python3 -m pip install psycopg2

    Successfully installed psycopg2-2.8.6

    View Slide

  12. Paolo Melchiorre ~ @pauloxnet
    12
    Activating PostgreSQL backend
    DATABASES = {
    "default": {
    "ENGINE": "django.db.backends.postgresql",
    "HOST": "127.0.0.1",
    "NAME": "mydatabase",
    "PASSWORD": "mypassword",
    "PORT": "5432",
    "USER": "mydatabaseuser",
    }
    }

    View Slide

  13. Paolo Melchiorre ~ @pauloxnet
    13
    PostgreSQL Connection URIs
    # environment
    DATABASE_URL=postgres://usr:[email protected]:5432/db
    # python3 -m pip install dj-database-url
    import dj_database_url
    DATABASES = {"default": dj_database_url.config()}
    # python3 -m pip install django-configurations
    import configurations
    DATABASES = configurations.values.DatabaseURLValue()

    View Slide

  14. Paolo Melchiorre ~ @pauloxnet
    Making queries
    14

    View Slide

  15. Paolo Melchiorre ~ @pauloxnet
    15
    Blog models
    from django.db import models
    class Author(models.Model):
    name = models.CharField(max_length=200)
    class Entry(models.Model):
    headline = models.CharField(max_length=255)
    body_text = models.TextField()
    authors = models.ManyToManyField("Author")

    View Slide

  16. Paolo Melchiorre ~ @pauloxnet
    16
    Making queries
    >>> from blog.models import Author
    >>> Author.objects.filter(name__contains="Terry")
    SELECT "name"
    FROM "blog_author"
    WHERE "name" LIKE '%Terry%'

    View Slide

  17. Paolo Melchiorre ~ @pauloxnet
    Kickstarter
    17

    View Slide

  18. Paolo Melchiorre ~ @pauloxnet
    18
    PostgreSQL contrib
    django.contrib.postgres (v1.8 ~2015)
    Fields, indexes, functions, extensions, …
    Crowdfound Marc Tamlyn (~2014)
    JSONfield* (v1.9 ~2015)
    Full-text search (v1.10 ~2016)
    RandomUUID (v2.0 ~2017)
    Operator Classes (v3.2 ~2021)

    View Slide

  19. Paolo Melchiorre ~ @pauloxnet
    19
    Activating PostgreSQL module
    INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "django.contrib.postgres",
    "blog",
    ]

    View Slide

  20. Paolo Melchiorre ~ @pauloxnet
    DjangoCon Europe 2017
    20

    View Slide

  21. Paolo Melchiorre ~ @pauloxnet
    • Version 4 (random) UUID
    • pgcrypto module
    • cryptographic functions
    • CryptoExtension migration
    • in-core from PG 13
    RandomUUID
    21

    View Slide

  22. Paolo Melchiorre ~ @pauloxnet
    22
    UUID field
    import uuid
    from django.db import models
    class Entry(models.Model):
    headline = models.CharField(max_length=255)
    body_text = models.TextField()
    authors = models.ManyToManyField("Author")
    uuid = models.UUIDField(default=uuid.uuid4)

    View Slide

  23. Paolo Melchiorre ~ @pauloxnet
    23
    RandomUUID update
    >>> from django.contrib.postgres import functions
    >>> from blog.models import Entry
    >>> Entry.objects.update(uuid=functions.RandomUUID())
    UPDATE "blog_entry"
    SET "uuid" = GEN_RANDOM_UUID()
    -- ~1 million UUIDs in ~30 s

    View Slide

  24. Paolo Melchiorre ~ @pauloxnet
    EuroPython 2017
    24

    View Slide

  25. Paolo Melchiorre ~ @pauloxnet
    • Fields, expressions, functions
    • GIN index
    • GiST index
    • Phrase search
    • Web search
    Full-text search
    25

    View Slide

  26. Paolo Melchiorre ~ @pauloxnet
    26
    Search lookup
    >>> from blog.models import Entry
    >>> Entry.objects.filter(headline__search="tomato")
    SELECT *
    FROM "blog_entry"
    WHERE to_tsvector("headline") @@
    plainto_tsquery('tomato')

    View Slide

  27. Paolo Melchiorre ~ @pauloxnet
    27
    SearchVector
    from django.contrib.postgres import indexes, search
    from django.db import models
    s = search.SearchVector("headline", config="english")
    class Entry(models.Model):
    headline = models.CharField(max_length=255)
    body_text = models.TextField()
    authors = models.ManyToManyField("Author")
    class Meta:
    indexes = [indexes.GinIndex(s, name="s_idx")]

    View Slide

  28. Paolo Melchiorre ~ @pauloxnet
    28
    SearchQuery
    >>> from django.contrib.postgres import search as fts
    >>> from blog.models import Entry, s
    >>> q = "ham OR tuna -pineapple"
    >>> doc = fts.SearchQuery(q, search_type='websearch')
    >>> Entry.objects.annotate(doc=s).filter(doc=doc)
    SELECT *
    FROM "blog_entry"
    WHERE to_tsvector('english', "headline") @@
    websearch_to_tsquery('ham OR tuna -pineapple')

    View Slide

  29. Paolo Melchiorre ~ @pauloxnet
    29

    View Slide

  30. Paolo Melchiorre ~ @pauloxnet
    • PG array
    • Array of data
    • Base field
    • Size
    • Multi-dimensional
    Array Field
    30

    View Slide

  31. Paolo Melchiorre ~ @pauloxnet
    31
    Array Email Field
    from django.contrib.postgres.fields import ArrayField
    from django.db import models
    class Author(models.Model):
    name = models.CharField(max_length=200)
    emails = ArrayField(models.EmailField())

    View Slide

  32. Paolo Melchiorre ~ @pauloxnet
    32
    Making array queries
    >>> from django.db.models import Q
    >>> from blog.models import Author
    >>> q = Q(emails__contains=["[email protected]"])
    >>> Author.objects.filter(q).values_list("emails")
    [['[email protected]', '[email protected]']]
    SELECT "emails"
    FROM "blog_author"
    WHERE "emails" @> ARRAY['[email protected]']

    View Slide

  33. Paolo Melchiorre ~ @pauloxnet
    33
    django-better-admin-arrayfield
    Before After

    View Slide

  34. Paolo Melchiorre ~ @pauloxnet
    California
    34

    View Slide

  35. Paolo Melchiorre ~ @pauloxnet
    • California Civic Data Coalition
    • PostgreSQL COPY support
    • COPY TO
    • COPY FROM
    • Temporary tables
    Dajngo Postgres Copy
    35

    View Slide

  36. Paolo Melchiorre ~ @pauloxnet
    36
    Django Postgres Copy - Model
    from django.db import models
    from postgres_copy import CopyManager
    class Feature(models.Model):
    name = models.CharField(max_length=255)
    alternative_names = models.TextField()
    osm_type = models.CharField(max_length=8)
    osm_id = models.PositiveBigIntegerField()

    objects = CopyManager()

    View Slide

  37. Paolo Melchiorre ~ @pauloxnet
    37
    Django Postgres Copy - CSV
    >>> import pathlib
    >>> pathlib.Path("/tmp/italy.csv").stat().st_size
    217879987 (~208 MB)
    >>> from blog.models import Feature
    >>> Feature.objects.from_csv("/tmp/italy.csv")
    918904 (~1 M) records loaded
    CPU times: user 354 ms, sys: 101 ms, total: 455 ms
    Wall time: 3.46 s

    View Slide

  38. Paolo Melchiorre ~ @pauloxnet
    38
    Django Postgres Copy - SQL
    CREATE TEMPORARY TABLE "temp_blog_feature" ("…");
    COPY "temp_blog_feature" FROM STDIN WITH CSV HEADER;
    -- COPY 918904
    -- Time: 2233,077 ms (00:02,233)
    INSERT INTO "blog_feature" ("…") (
    SELECT … FROM "temp_blog_feature"
    );
    DROP TABLE IF EXISTS "temp_blog_feature";

    View Slide

  39. Paolo Melchiorre ~ @pauloxnet
    39
    Django Postgres Copy - GZip
    >>> import pathlib
    >>> pathlib.Path("/tmp/italy.csv.gz").stat().st_size
    42876187 (~41 MB)
    >>> from geonames.models import Feature
    >>> import gzip
    >>> with gzip.open("/tmp/italy.csv.gz", "rb") as f:
    Feature.objects.from_csv(f)
    918904 (~1M) records loaded
    CPU times: user 1.02 s, sys: 42.8 ms, total: 1.06 s
    Wall time: 3.45 s

    View Slide

  40. Paolo Melchiorre ~ @pauloxnet
    Tree
    40

    View Slide

  41. Paolo Melchiorre ~ @pauloxnet
    • ltree module
    • Hierarchical tree-like data
    • Materialized path
    • Dedicated PathField
    • Abstract TreeModel
    Django ltree
    41

    View Slide

  42. Paolo Melchiorre ~ @pauloxnet
    42
    TreeModel
    from django.contrib.postgres import indexes as idx
    from django_ltree.models import TreeModel
    class Entry(TreeModel):
    headline = models.CharField(max_length=255)
    body_text = models.TextField()
    authors = models.ManyToManyField("Author")
    class Meta:
    indexes = [idx.GistIndex(fields=['path'])]

    View Slide

  43. Paolo Melchiorre ~ @pauloxnet
    43
    Tree - Example
    top
    pictures
    science
    astronomy astronomy
    astrophysics cosmology galaxies stars astronauts

    View Slide

  44. Paolo Melchiorre ~ @pauloxnet
    44
    Tree - Queries
    >>> from blog.models import Entry
    >>> path = "top.science"
    >>> q = Entry.objects.filter(path__descendants=path)
    >>> q.order_by("path").values_list("path", flat=True)
    [top.science,
    top.science.astronomy,
    top.science.astronomy.astrophysics,
    top.science.astronomy.cosmology]
    SELECT "path" FROM "blog_entry"
    WHERE "path"

    View Slide

  45. Paolo Melchiorre ~ @pauloxnet
    GeoDjango
    45

    View Slide

  46. Paolo Melchiorre ~ @pauloxnet
    • PostgreSQL extension
    • Best* GeoDjango backend
    • Spatial data types
    • Spatial indexing
    • Spatial functions
    PostGIS
    46

    View Slide

  47. Paolo Melchiorre ~ @pauloxnet
    47
    GeoDjango Compatibility Table
    ● Spatial Lookups (18/31)
    ● Database functions (15/32)
    ● Aggregate Functions (0/5)
    MariaDB
    49%
    ● Spatial Lookups (18/31)
    ● Database functions (21/32)
    ● Aggregate Functions (2/5)
    Oracle
    60%
    ● Spatial Lookups (31/31)
    ● Database functions (32/32)
    ● Aggregate Functions (5/5)
    PostGIS
    100%
    ● Spatial Lookups (22/31)
    ● Database functions (30/32)
    ● Aggregate Functions (4/5)
    Spatialite
    82%
    ● Spatial Lookups (18/31)
    ● Database functions (16/32)
    ● Aggregate Functions (0/5)
    MySQL
    50%

    View Slide

  48. Paolo Melchiorre ~ @pauloxnet
    48
    What’s next
    • Indexes
    • Aggregation functions
    • Trigram extension
    • Range fields
    • Case-insensitive text fields
    • ...

    View Slide

  49. Paolo Melchiorre ~ @pauloxnet
    49
    Tips
    • docs in djangoproject.com
    • details in postgresql.org
    • source code in github.com
    • questions in stackoverflow.com

    View Slide

  50. Paolo Melchiorre ~ @pauloxnet
    50
    License
    CC BY-SA 4.0
    This work is licensed under
    a Creative Commons
    Attribution-ShareAlike 4.0
    International License.

    View Slide

  51. Paolo Melchiorre ~ @pauloxnet
    @psycopg
    danielevarrazzo
    psycopg
    lists.postgresql.org
    psycopg.org
    51

    View Slide

  52. Paolo Melchiorre ~ @pauloxnet
    @20tab
    20tab
    20tab
    [email protected]
    20tab.com
    52

    View Slide

  53. Paolo Melchiorre ~ @pauloxnet
    @pauloxnet
    paolomelchiorre
    pauloxnet
    [email protected]
    paulox.net
    53

    View Slide