Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multitenant Applications: How and Why

Multitenant Applications: How and Why

Applications often need multitenancy at some level. The most common scenario is to keep data isolated among clients. One way to achieve this is to have multiple database instances and connect to each according to the user accessing the system. Another approach is to have a single database and model relationships so it's possible to query data separately. The last common way is again to have a single database instance, but this time there are multiple separate schemas. I'll go over each of these approaches. For each, you will learn about the architecture, understand how to build it using Django, see examples on how to make queries and learn what tools can help on the job. By the end, you will understand key differences and be able to choose the approach that better suits your next application.


Filipe Ximenes

August 04, 2017


  1. Multitenant applications: how and why @xima

  2. None
  3. Who am I? • Filipe Ximenes • Recife / Brazil

    • Aussie for 1 year (2008 - 2009)
  4. None
  5. vinta.com.br/playbook

  6. FLOSS Django React boilerplate https://github.com/vintasoftware/django-react-boilerplate Django Role Permissions https://github.com/vintasoftware/django-role-permissions Tapioca

  7. Context

  8. Corporate Fidget Spinner Tracking

  9. None
  10. None
  11. None
  12. "How do you protect our data?"

  13. What is Multitenancy

  14. "... refers to a software architecture in which a single

    instance of software runs on a server and serves multiple tenants." - Wikipedia
  15. What we want to achieve? • Reduce infrastructure costs by

    sharing hardware resources • Simplify software maintenance by keeping a single code base • Simplify infrastructure maintenance by having fewer nodes
  16. Single Shared Schema [or how the big guys do it]

  17. None
  18. "Talk is cheap..."

  19. Routing - ibm.spinnertracking.com def tenant_middleware(get_response): def middleware(request): host = request.get_host().split(':')[0]

    subdomain = host.split('.')[0] try: customer = Customer.objects.get(name=subdomain) except Customer.DoesNotExist: customer = None request.customer = customer response = get_response(request) return response return middleware
  20. Querying avg_duration = ( Spin.objects .filter(user_spinner__user__customer=request.customer) .aggregate(avg=Avg('duration')))['avg']

  21. None
  22. Simpler querying avg_duration = ( Spin.objects .filter(customer=request.customer) .aggregate(avg=Avg('duration')))['avg']

  23. Case study: Salesforce • 1:5000 ratio • Double checking •

    Transparent to developers
  24. Drawbacks • Guaranteeing isolation is hard • Might lead to

    complexity to the codebase • 3rd party library integration
  25. Multiple databases

  26. None
  27. Routing DATABASES = { 'default': { 'ENGINE': ..., 'NAME': ...,

    }, 'ibm': { 'ENGINE': ..., 'NAME': ..., } }
  28. The `.using()` approach spinners = ( Spinner.objects .using(request.customer.name) .annotate( avg_duration=Avg('owned_spinners__spins__duration'))

  29. The threadlocal middleware approach def multidb_middleware(get_response): def middleware(request): subdomain =

    get_subdomain(request) customer = get_customer(subdomain) request.customer = customer @thread_local(using_db=customer.name) def execute_request(request): return get_response(request) response = execute_request(request) return response return middleware
  30. The router class TenantRouter(object): def db_for_read(self, model, **hints): return get_thread_local('using_db',

    'default') def db_for_write(self, model, **hints): return get_thread_local('using_db', 'default') # … # settings.py DATABASE_ROUTERS = ['multitenancy.routers.TenantRouter']
  31. Querying spinners = ( Spinner.objects .using(request.customer.name) .annotate( avg_duration=Avg('owned_spinners__spins__duration')) .order_by('-avg_duration'))

  32. Database Multitenancy vs. Application Multitenancy

  33. Single Database Multiple Schemas

  34. None
  35. What are schemas in the first place? SELECT id, name

    FROM user WHERE user.name LIKE 'F%';
  36. What are schemas in the first place? CREATE SCHEMA ibm;

    SELECT id, name FROM ibm.user WHERE ibm.user.name LIKE 'F%';
  37. The `search_path` SET search_path TO ibm; SELECT id, name FROM

    user WHERE user.name LIKE 'F%';
  38. Django-tenant-schemas

  39. Routing - middleware # ... connection.set_schema_to_public() hostname = self.hostname_from_request(request) TenantModel

    = get_tenant_model() try: tenant = self.get_tenant(TenantModel, hostname, request) assert isinstance(tenant, TenantModel) except TenantModel.DoesNotExist: # ... request.tenant = tenant connection.set_tenant(request.tenant) # ...
  40. Routing - settings MIDDLEWARE_CLASSES = [ 'tenant_schemas.middleware.TenantMiddleware', # … ]

    DATABASES = { 'default': { 'ENGINE': 'tenant_schemas.postgresql_backend', 'NAME': 'mydb', } }
  41. Routing - db backend # ... try: cursor_for_search_path.execute( 'SET search_path

    = {0}'.format(','.join(search_paths))) except (django.db.utils.DatabaseError, psycopg2.InternalError): self.search_path_set = False else: self.search_path_set = True if name: cursor_for_search_path.close() # ...
  42. The Command Line ./manage.py tenant_command shell ./manage.py createsuperuser ./manage.py migrate_schemas

  43. Querying spinners = ( Spinner.objects .annotate( avg_duration=Avg('owned_spinners__spins__duration')) .order_by('-avg_duration'))

  44. SELECT id, duration FROM ibm.spinner_spin WHERE duration > 120 UNION

    SELECT id, duration FROM vinta.spinner_spin WHERE duration > 120; Querying across schemas
  45. SELECT uuid, duration FROM ibm.spinner_spin WHERE duration > 120 UNION

    SELECT uuid, duration FROM vinta.spinner_spin WHERE duration > 120; Querying across schemas
  46. Upsides • Querying looks same as standard application • New

    schemas created automatically • Knows how to handle migrations • Simpler infrastructure
  47. Drawbacks • Be carefull with too many schemas (maybe not

    more than 100's clients?) • Tests need some setup and might get slower • Harder to query across schemas
  48. multitenancy is not discrete, it is a continuous spectrum

  49. bit.ly/django-multitenancy github.com/filipeximenes/multitenancy

  50. Obrigado! http://bit.ly/vinta2017 Newsletter: vinta.com.br/blog/ twitter.com/@xima github.com/filipeximenes ximenes@vinta.com.br