Slide 1

Slide 1 text

Database replication with Django and Postgres Marcin Gębala - PyCon APAC 2024

Slide 2

Slide 2 text

About me » Principal Developer at Saleor Commerce (saleor.io) » Specialize in web development using Python, Django and GraphQL » Based in Wrocław, Poland ! Marcin Gębala - PyCon APAC 2024

Slide 3

Slide 3 text

What is database replication? Marcin Gębala - PyCon APAC 2024

Slide 4

Slide 4 text

Database replication Database replication is the process of creating copies of a database and keeping them in sync. » Leader-follower model: » One DB designated as a primary (leader); used for write operations. » When data is written to the primary DB, the leader sends the changes to the replicas (followers). Marcin Gębala - PyCon APAC 2024

Slide 5

Slide 5 text

Read vs. write API calls Marcin Gębala - PyCon APAC 2024

Slide 6

Slide 6 text

Benefits » Scalability - distribute load between multiple DB servers; offload traffic from the main DB. » Availability - if one server fails, the others can take over. » Latency - keep data geographically closer to the users and reduce latency. Marcin Gębala - PyCon APAC 2024

Slide 7

Slide 7 text

Replication types Logical - Works on row/table level; fine-grained control over replication and security. - Use-case: sending incremental changes to a replica DB; replicating between different versions of Postgres. - Uses internal data structures to replicate changes. default_db=# CREATE PUBLICATION pub1 FOR TABLE t1; CREATE PUBLICATION replica_db=# CREATE SUBSCRIPTION sub1 replica_db-# CONNECTION 'host=localhost dbname=default_db application_name=sub1' replica_db-# PUBLICATION pub1; Physical - Byte-level replication - instead of sending table/row changes, stream entire disk blocks. - Use-case: keep standby servers in sync with the primary DB to handle failover. - Also: shared disk or file system replications. Marcin Gębala - PyCon APAC 2024

Slide 8

Slide 8 text

Replication lag Temporary inconsistency between the master and replica DBs. » Clients reading from the replica may see outdated data. » Milliseconds, but can grow to seconds or minutes under heavy load. » When data consistency is crucial, use the primary DB for reads. » financial transactions, updating statuses of orders etc. Marcin Gębala - PyCon APAC 2024

Slide 9

Slide 9 text

Using Django with multiple databases Marcin Gębala - PyCon APAC 2024

Slide 10

Slide 10 text

Settings DB_DEFAULT = "default" DB_REPLICA = "replica" DATABASES = { DB_DEFAULT: dj_database_url.config( default="postgres://username:pass@localhost:5432/default_db", conn_max_age=600 ), DB_REPLICA: dj_database_url.config( default="postgres://username:pass@localhost:5432/replica_db", conn_max_age=600, ), } Marcin Gębala - PyCon APAC 2024

Slide 11

Slide 11 text

DB router Global database routing configuration for the Django application. from django.conf import settings class DbRouter: def db_for_read(self, model, **hints): return settings.DB_REPLICA def db_for_write(self, model, **hints): return settings.DB_DEFAULT # `SELECT` query will use the replica DB automatically product = Product.objects.filter(name="Apple juice").first() # `UPDATE` query will use the primary DB product.category = "Drinks" product.save() Marcin Gębala - PyCon APAC 2024

Slide 12

Slide 12 text

Manually selecting DB for a query Manually select a database for a query with the using() method. Example: in views that always read data, always use the replica DB: def get_product(request, product_id): product = ( Product.objects.using(settings.DB_REPLICA) .filter(id=product_id) .first() ) return TemplateResponse(request, "product.html", {"product": product}) Marcin Gębala - PyCon APAC 2024

Slide 13

Slide 13 text

Manually selecting DB for a query Creating a new order from a checkout needs to operate on the newest version of data. Marcin Gębala - PyCon APAC 2024

Slide 14

Slide 14 text

Other optimization methods Other methods to optimize the database performance: » Add connection pooling solution e.g. pgBouncer to manage connections to the DB. » Monitor DB queries performance using Postgres's EXPLAIN command and optimize slow queries. Marcin Gębala - PyCon APAC 2024

Slide 15

Slide 15 text

Query plan visualization Marcin Gębala - PyCon APAC 2024

Slide 16

Slide 16 text

Resources » Designing Data-Intensive Applications by Martin Kleppmann » Postgres documentation: postgresql.org/docs » Query plan visualization tool: explain.dalibo.com » pgcli - Postgres CLI with auto-completion: github.com/dbcli/pgcli » Saleor Commerce: github.com/saleor/saleor Marcin Gębala - PyCon APAC 2024

Slide 17

Slide 17 text

Thank you! Contact: github.com/maarcingebala Scan the QR code to access the slides. Marcin Gębala - PyCon APAC 2024