Database replication with Django and Postgres

Database replication with Django and Postgres Marcin Gębala - PyCon
APAC 2024

About me » Principal Developer at Saleor Commerce (saleor.io) »
Specialize in web development using Python, Django and GraphQL » Based in Wrocław, Poland ! Marcin Gębala - PyCon APAC 2024

What is database replication? Marcin Gębala - PyCon APAC 2024

Database replication Database replication is the process of creating copies
of a database and keeping them in sync. » Leader-follower model: » One DB designated as a primary (leader); used for write operations. » When data is written to the primary DB, the leader sends the changes to the replicas (followers). Marcin Gębala - PyCon APAC 2024

Read vs. write API calls Marcin Gębala - PyCon APAC
2024

Benefits » Scalability - distribute load between multiple DB servers;
offload traffic from the main DB. » Availability - if one server fails, the others can take over. » Latency - keep data geographically closer to the users and reduce latency. Marcin Gębala - PyCon APAC 2024

Replication types Logical - Works on row/table level; ﬁne-grained control
over replication and security. - Use-case: sending incremental changes to a replica DB; replicating between different versions of Postgres. - Uses internal data structures to replicate changes. default_db=# CREATE PUBLICATION pub1 FOR TABLE t1; CREATE PUBLICATION replica_db=# CREATE SUBSCRIPTION sub1 replica_db-# CONNECTION 'host=localhost dbname=default_db application_name=sub1' replica_db-# PUBLICATION pub1; Physical - Byte-level replication - instead of sending table/row changes, stream entire disk blocks. - Use-case: keep standby servers in sync with the primary DB to handle failover. - Also: shared disk or ﬁle system replications. Marcin Gębala - PyCon APAC 2024

Replication lag Temporary inconsistency between the master and replica DBs.
» Clients reading from the replica may see outdated data. » Milliseconds, but can grow to seconds or minutes under heavy load. » When data consistency is crucial, use the primary DB for reads. » ﬁnancial transactions, updating statuses of orders etc. Marcin Gębala - PyCon APAC 2024

Using Django with multiple databases Marcin Gębala - PyCon APAC
2024

Settings DB_DEFAULT = "default" DB_REPLICA = "replica" DATABASES = {
DB_DEFAULT: dj_database_url.config( default="postgres://username:pass@localhost:5432/default_db", conn_max_age=600 ), DB_REPLICA: dj_database_url.config( default="postgres://username:pass@localhost:5432/replica_db", conn_max_age=600, ), } Marcin Gębala - PyCon APAC 2024

DB router Global database routing conﬁguration for the Django application.
from django.conf import settings class DbRouter: def db_for_read(self, model, **hints): return settings.DB_REPLICA def db_for_write(self, model, **hints): return settings.DB_DEFAULT # `SELECT` query will use the replica DB automatically product = Product.objects.filter(name="Apple juice").first() # `UPDATE` query will use the primary DB product.category = "Drinks" product.save() Marcin Gębala - PyCon APAC 2024

Manually selecting DB for a query Manually select a database
for a query with the using() method. Example: in views that always read data, always use the replica DB: def get_product(request, product_id): product = ( Product.objects.using(settings.DB_REPLICA) .filter(id=product_id) .first() ) return TemplateResponse(request, "product.html", {"product": product}) Marcin Gębala - PyCon APAC 2024

Manually selecting DB for a query Creating a new order
from a checkout needs to operate on the newest version of data. Marcin Gębala - PyCon APAC 2024

Other optimization methods Other methods to optimize the database performance:
» Add connection pooling solution e.g. pgBouncer to manage connections to the DB. » Monitor DB queries performance using Postgres's EXPLAIN command and optimize slow queries. Marcin Gębala - PyCon APAC 2024

Query plan visualization Marcin Gębala - PyCon APAC 2024

Resources » Designing Data-Intensive Applications by Martin Kleppmann » Postgres
documentation: postgresql.org/docs » Query plan visualization tool: explain.dalibo.com » pgcli - Postgres CLI with auto-completion: github.com/dbcli/pgcli » Saleor Commerce: github.com/saleor/saleor Marcin Gębala - PyCon APAC 2024

Thank you! Contact: github.com/maarcingebala Scan the QR code to access
the slides. Marcin Gębala - PyCon APAC 2024

Database replication with Django and Postgres

Database replication with Django and Postgres

Marcin Gębala

More Decks by Marcin Gębala

Featured

Transcript

Database replication with Django and Postgres Marcin Gębala - PyCon

About me » Principal Developer at Saleor Commerce (saleor.io) »

What is database replication? Marcin Gębala - PyCon APAC 2024

Database replication Database replication is the process of creating copies

Read vs. write API calls Marcin Gębala - PyCon APAC

Beneﬁts » Scalability - distribute load between multiple DB servers;

Replication types Logical - Works on row/table level; ﬁne-grained control

Replication lag Temporary inconsistency between the master and replica DBs.

Using Django with multiple databases Marcin Gębala - PyCon APAC

Settings DB_DEFAULT = "default" DB_REPLICA = "replica" DATABASES = {

DB router Global database routing conﬁguration for the Django application.

Manually selecting DB for a query Manually select a database

Manually selecting DB for a query Creating a new order

Other optimization methods Other methods to optimize the database performance:

Query plan visualization Marcin Gębala - PyCon APAC 2024

Resources » Designing Data-Intensive Applications by Martin Kleppmann » Postgres

Thank you! Contact: github.com/maarcingebala Scan the QR code to access