Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database replication with Django and Postgres

Marcin Gębala
October 24, 2024
20

Database replication with Django and Postgres

Database replication is a concept of storing the same data in multiple databases, which is useful in applications with high traffic, to maintain high availability and performance. Once the database becomes a bottleneck, a common solution is to create a setup with a separate database for write operations (a writer) and one or more databases for read operations (readers, replicas).

In this talk, we will look into a setup of a web application based on Django and Postgres, that is configured to use two databases - a writer and a reader. We will look at different aspects of this solution:

- theory of how replication works in different databases
- how to configure multiple databases with Django and Postgres
- practical examples of patterns to use in code to handle reading and writing using separate databases
- challenges and edge cases that occur with this setup, such as handling the replication lag.

We will use an example web app to show all the concepts and look at real-life examples based on a large production app written in Django.

The structure of the talk is as follows:

1. Introduction to database replication - how it works, when and why it is useful.
2. Database replication in practice with Django and Postgres - configuration and best practices to use in code.
3. Challenges and edge cases and how to deal with them.

Marcin Gębala

October 24, 2024
Tweet

Transcript

  1. About me » Principal Developer at Saleor Commerce (saleor.io) »

    Specialize in web development using Python, Django and GraphQL » Based in Wrocław, Poland ! Marcin Gębala - PyCon APAC 2024
  2. Database replication Database replication is the process of creating copies

    of a database and keeping them in sync. » Leader-follower model: » One DB designated as a primary (leader); used for write operations. » When data is written to the primary DB, the leader sends the changes to the replicas (followers). Marcin Gębala - PyCon APAC 2024
  3. Benefits » Scalability - distribute load between multiple DB servers;

    offload traffic from the main DB. » Availability - if one server fails, the others can take over. » Latency - keep data geographically closer to the users and reduce latency. Marcin Gębala - PyCon APAC 2024
  4. Replication types Logical - Works on row/table level; fine-grained control

    over replication and security. - Use-case: sending incremental changes to a replica DB; replicating between different versions of Postgres. - Uses internal data structures to replicate changes. default_db=# CREATE PUBLICATION pub1 FOR TABLE t1; CREATE PUBLICATION replica_db=# CREATE SUBSCRIPTION sub1 replica_db-# CONNECTION 'host=localhost dbname=default_db application_name=sub1' replica_db-# PUBLICATION pub1; Physical - Byte-level replication - instead of sending table/row changes, stream entire disk blocks. - Use-case: keep standby servers in sync with the primary DB to handle failover. - Also: shared disk or file system replications. Marcin Gębala - PyCon APAC 2024
  5. Replication lag Temporary inconsistency between the master and replica DBs.

    » Clients reading from the replica may see outdated data. » Milliseconds, but can grow to seconds or minutes under heavy load. » When data consistency is crucial, use the primary DB for reads. » financial transactions, updating statuses of orders etc. Marcin Gębala - PyCon APAC 2024
  6. Settings DB_DEFAULT = "default" DB_REPLICA = "replica" DATABASES = {

    DB_DEFAULT: dj_database_url.config( default="postgres://username:pass@localhost:5432/default_db", conn_max_age=600 ), DB_REPLICA: dj_database_url.config( default="postgres://username:pass@localhost:5432/replica_db", conn_max_age=600, ), } Marcin Gębala - PyCon APAC 2024
  7. DB router Global database routing configuration for the Django application.

    from django.conf import settings class DbRouter: def db_for_read(self, model, **hints): return settings.DB_REPLICA def db_for_write(self, model, **hints): return settings.DB_DEFAULT # `SELECT` query will use the replica DB automatically product = Product.objects.filter(name="Apple juice").first() # `UPDATE` query will use the primary DB product.category = "Drinks" product.save() Marcin Gębala - PyCon APAC 2024
  8. Manually selecting DB for a query Manually select a database

    for a query with the using() method. Example: in views that always read data, always use the replica DB: def get_product(request, product_id): product = ( Product.objects.using(settings.DB_REPLICA) .filter(id=product_id) .first() ) return TemplateResponse(request, "product.html", {"product": product}) Marcin Gębala - PyCon APAC 2024
  9. Manually selecting DB for a query Creating a new order

    from a checkout needs to operate on the newest version of data. Marcin Gębala - PyCon APAC 2024
  10. Other optimization methods Other methods to optimize the database performance:

    » Add connection pooling solution e.g. pgBouncer to manage connections to the DB. » Monitor DB queries performance using Postgres's EXPLAIN command and optimize slow queries. Marcin Gębala - PyCon APAC 2024
  11. Resources » Designing Data-Intensive Applications by Martin Kleppmann » Postgres

    documentation: postgresql.org/docs » Query plan visualization tool: explain.dalibo.com » pgcli - Postgres CLI with auto-completion: github.com/dbcli/pgcli » Saleor Commerce: github.com/saleor/saleor Marcin Gębala - PyCon APAC 2024