$30 off During Our Annual Pro Sale. View Details »

Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbean 2019 | Louise Grandjonc

Citus Data
February 17, 2019

Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbean 2019 | Louise Grandjonc

There are a number of data architectures you could use when building a multi-tenant app. Some, such as using one database per customer or one schema per customer. These two options scale to an extent when you have say 10s of tenants. However as you start scaling to hundreds and thousands of tenants, you start running into challenges both from performance and maintenance of tenants perspective. You could solve the above problem by adding the notion of tenancy directly into the logic of your SaaS application. How to implement/automate this in Django-ORM is a challenge? We will talk about how to make the django app tenant aware and at a broader level explain how scale out applications that are built on top of Django ORM and follow a multi tenant data model. We'd take postgresql as our database of choice and the logic/implementation can be extended to any other relational databases as well.

Citus Data

February 17, 2019
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Scaling Multi-Tenant Applications
    Using the Django ORM & Postgres
    Louise Grandjonc - Pycaribbean 2019
    @louisemeta

    View Slide

  2. About me
    Software Engineer at Citus Data
    Postgres enthusiast
    @louisemeta and @citusdata on twitter
    www.louisemeta.com
    [email protected]
    @louisemeta

    View Slide

  3. Today’s agenda
    1. What do we mean by “multi-tenancy”?
    2. Three ways to scale a multi-tenant app
    3. Shared tables in your Django apps
    4. Postgres, citus and shared tables
    @louisemeta

    View Slide

  4. What do we mean by
    “multi-tenancy”
    @louisemeta

    View Slide

  5. What do we mean by “multi-tenancy”
    - Multiple customers (tenants)
    - Each with their own data
    - SaaS
    - Example: shopify, salesforce
    @louisemeta

    View Slide

  6. What do we mean by “multi-tenancy”
    A very realistic example
    Owner examples:
    - Hogwarts
    - Ministry of Magic
    - Harry Potter
    - Post office
    @louisemeta

    View Slide

  7. What do we mean by “multi-tenancy”
    The problem of scaling multi tenant apps
    @louisemeta

    View Slide

  8. 3 ways to scale
    a multi-tenant app
    @louisemeta

    View Slide

  9. Solution 1:
    One database per tenant
    @louisemeta

    View Slide

  10. - Organized collection of interrelated data
    - Don’t share resources:
    - Username and password
    - Connections
    - Memory
    @louisemeta
    One database per tenant

    View Slide

  11. @louisemeta
    One database per tenant

    View Slide

  12. 1. Changing the settings
    DATABASES = {
    'tenant-{id1}': {
    'ENGINE': 'django.db.backends.postgresql',
    'NAME': 'hogwarts',
    'USER': 'louise',
    'PASSWORD': ‘abc',
    'HOST': '…',
    'PORT': '5432'
    },
    'tenant-{id2}': {
    'ENGINE': 'django.db.backends.postgresql',
    'NAME': ‘ministry',
    'USER': 'louise',
    'PASSWORD': 'abc',
    'HOST': '…',
    'PORT': '5432'
    },

    }
    Warnings !
    - You need to have each tenant in
    the settings.
    - When you have a new customer,
    you need to create a database
    and change the settings.
    One database per tenant
    @louisemeta

    View Slide

  13. 2. Handling migrations
    python manage.py migrate —database=tenant_id1;
    For each tenant, when you have a new migration
    One database per tenant
    @louisemeta

    View Slide

  14. Changes needed to handle it with Django ORM
    3. Creating your own database router
    class ExampleDatabaseRouter(object):
    """
    Determines on which tenant database to read/write
    """
    def db_for_read(self, model, **hints):
    “”"Returns the name of the right database depending on the query"""
    return ‘tenant_idx’
    def db_for_write(self, model, **hints):
    “”"Returns the name of the right database depending on the query"""
    return ‘tenant_idx’
    def allow_relation(self, obj1, obj2, **hints):
    """Determine if relationship is allowed between two objects.
    The two objects have to be on the same database ;)”””
    pass
    One database per tenant
    @louisemeta

    View Slide

  15. PROS
    - Start quickly
    - Isolate customer (tenant) data
    - Compliance is a bit easier
    - If one customer is queried a lot,
    performance degrade will be low
    - Time for DBA/developer to manage
    - Hard to handle with ORMs
    - Maintain consistency
    (ex: create index across all databases)
    - Longer running migrations
    - Performance degrades as # customers (tenants)
    goes up
    CONS
    One database per tenant
    @louisemeta

    View Slide

  16. Solution 2:
    One schema per tenant
    @louisemeta

    View Slide

  17. - Logical namespaces to hold a set of tables
    - Share resources:
    - Username and password
    - Connections
    - Memory
    One schema per tenant
    @louisemeta

    View Slide

  18. One schema per tenant
    @louisemeta

    View Slide

  19. PROS
    - Better resource utilization vs.
    one database per tenant
    - Start quickly
    - Logical isolation
    - Hard to manage (ex: add column across
    all schemas)
    - Longer running migrations
    - Performance degrades as # customers
    (tenants) goes up
    CONS
    One schema per tenant
    @louisemeta

    View Slide

  20. Solution 3:
    Shared tables architecture
    @louisemeta

    View Slide

  21. Shared tables architecture
    @louisemeta

    View Slide

  22. Shared tables architecture
    @louisemeta

    View Slide

  23. PROS
    - Easy maintenance
    - Faster running migrations
    - Best resource utilization
    - Faster performance
    - Scales to 1k-100k tenants
    - Application code to guarantee isolation
    - Make sure ORM calls are always scoped to
    a single tenant
    CONS
    Shared tables architecture
    @louisemeta

    View Slide

  24. 3 ways to scale multi-tenant apps
    @louisemeta

    View Slide

  25. Shared tables in your Django apps
    @louisemeta

    View Slide

  26. Main problems to solve
    - Make sure ORM calls are always scoped to a
    single tenant
    - Include the tenant column to joins
    @louisemeta

    View Slide

  27. django-multitenant
    Automates all ORM calls
    to be scoped to a single tenant
    @louisemeta

    View Slide

  28. django-multitenant
    Owl.objects.filter(name=‘Hedwige’)
    <=>
    SELECT * from app_owl where name=‘Hedwige’
    Owl.objects.filter(id=1)
    <=>
    SELECT * from app_owl
    WHERE name=‘Hedwige’
    AND owner_id =

    View Slide

  29. django-multitenant
    Letter.objects.filter(id=1).select_related(‘deliverer_id’)
    <=>
    SELECT * from app_letter
    INNER JOIN app_owl ON (app_owl.id=app_letter.deliverer_id)
    WHERE app_letter.id=1
    Letter.objects.filter(id=1).select_related(‘deliverer_id’)
    <=>
    SELECT * from app_letter
    INNER JOIN app_owl ON (app_owl.id=app_letter.deliverer_id
    AND app_owl.owner_id=app_letter.owner_id)
    WHERE app_letter.id=1 AND app_owl.owner_id =
    @louisemeta

    View Slide

  30. django-multitenant
    3 steps
    1. Change models to use TenantMixin and TenantManagerMixin

    2.Change ForeignKey to TenantForeignKey 

    3.Define tenant scoping: set_current_tenant(t) 

    @louisemeta

    View Slide

  31. django-multitenant
    3 steps
    Models before using django-multitenant
    class Owner(models.Model):
    type = models.CharField(max_length=10) # add choice
    name = models.CharField(max_length=255)
    class Owl(models.Model):
    name = models.CharField(max_length=255)
    owner = models.ForeignKey(Owner)
    feather_color = models.CharField(max_length=255)
    favorite_food = models.CharField(max_length=255)
    class Letters(models.Model):
    content = models.TextField()
    deliverer = models.ForeignKey(Owl)
    @louisemeta

    View Slide

  32. django-multitenant
    3 steps
    Models with django-multitenant
    class TenantManager(TenantManagerMixin, models.Manager):
    pass
    class Owner(TenantModelMixin, models.Model):
    type = models.CharField(max_length=10) # add choice
    name = models.CharField(max_length=255)
    tenant_id = ‘id’
    objects = TenantManager()
    class Owl(TenantModelMixin, models.Model):
    name = models.CharField(max_length=255)
    owner = TenantForeignKey(Owner)
    feather_color = models.CharField(max_length=255)
    favorite_food = models.CharField(max_length=255)
    tenant_id = ‘owner_id’
    objects = TenantManager()
    class Letters(TenantModelMixin, models.Model):
    content = models.TextField()
    deliverer = models.ForeignKey(Owl)
    owner = TenantForeignKey(Owner)
    tenant_id = ‘owner_id’
    objects = TenantManager()
    @louisemeta

    View Slide

  33. django-multitenant
    3 steps
    set_current_tenant(t)
    - Specifies which tenant the APIs should be scoped to
    - Set at authentication logic via middleware
    - Set explicitly at top of function (ex. view, external tasks/jobs)
    @louisemeta

    View Slide

  34. django-multitenant
    3 steps
    set_current_tenant(t) in Middleware
    class TenantMiddleware:
    def __init__(self, get_response):
    self.get_response = get_response
    # One-time configuration and initialization.
    def __call__(self, request):
    #Assuming your app has a function to get the tenant associated for a user
    current_tenant = get_tenant_for_user(request.user)
    set_current_tenant(current_tenant)
    response = self.get_response(request)
    return response
    @louisemeta

    View Slide

  35. django-multitenant
    Benefits of django-multitenant
    - Drop-in implementation of shared tables architecture
    - Guarantees isolation
    - Ready to scale with distributed Postgres (Citus)
    @louisemeta

    View Slide

  36. Postgres, citus
    and shared tables
    @louisemeta

    View Slide

  37. Why Postgres
    - Open source
    - Constraints
    - Rich SQL support
    - Extensions
    - PostGIS / Geospatial
    - HLL
    - TopN
    - Citus
    - Foreign data wrappers
    - Fun indexes (GIN, GiST, BRIN…)
    - CTEs
    - Window functions
    - Full text search
    - Datatypes
    - JSONB
    @louisemeta

    View Slide

  38. Why citus
    - Citus is an open source extension for postgreSQL
    - Implements a distributed architecture for postgres
    - Allows you to scale out CPU, memory, etc.
    - Compatible with modern postgres (up to 11)
    @louisemeta

    View Slide

  39. Distributed Postgres with citus
    @louisemeta

    View Slide

  40. Distributed Postgres with citus
    Foreign key colocation
    @louisemeta

    View Slide

  41. - Full SQL support for queries on a single set of co-located
    shards
    - Multi-statement transaction support for modifications on a
    single set of co-located shards
    - Foreign keys
    - …
    Distributed Postgres with citus
    Foreign key colocation
    @louisemeta

    View Slide

  42. Scope your queries !
    Distributed Postgres with citus
    @louisemeta

    View Slide

  43. Why citus
    @louisemeta

    View Slide

  44. Scale out Django!
    github.com/citusdata/django-multitenant
    [email protected]
    citusdata.com/newsletter
    @louisemeta.
    @citusdata

    View Slide