Multitenant Applications: How and Why

Multitenant applications: how and why @xima

Who am I? • Filipe Ximenes • Recife / Brazil
• Aussie for 1 year (2008 - 2009)

vinta.com.br/playbook

FLOSS Django React boilerplate https://github.com/vintasoftware/django-react-boilerplate Django Role Permissions https://github.com/vintasoftware/django-role-permissions Tapioca
https://github.com/vintasoftware/tapioca-wrapper

Context

Corporate Fidget Spinner Tracking

"How do you protect our data?"

What is Multitenancy

"... refers to a software architecture in which a single
instance of software runs on a server and serves multiple tenants." - Wikipedia

What we want to achieve? • Reduce infrastructure costs by
sharing hardware resources • Simplify software maintenance by keeping a single code base • Simplify infrastructure maintenance by having fewer nodes

Single Shared Schema [or how the big guys do it]

"Talk is cheap..."

Routing - ibm.spinnertracking.com def tenant_middleware(get_response): def middleware(request): host = request.get_host().split(':')[0]
subdomain = host.split('.')[0] try: customer = Customer.objects.get(name=subdomain) except Customer.DoesNotExist: customer = None request.customer = customer response = get_response(request) return response return middleware

Querying avg_duration = ( Spin.objects .filter(user_spinner__user__customer=request.customer) .aggregate(avg=Avg('duration')))['avg']

Simpler querying avg_duration = ( Spin.objects .filter(customer=request.customer) .aggregate(avg=Avg('duration')))['avg']

Case study: Salesforce • 1:5000 ratio • Double checking •
Transparent to developers

Drawbacks • Guaranteeing isolation is hard • Might lead to
complexity to the codebase • 3rd party library integration

Multiple databases

Routing DATABASES = { 'default': { 'ENGINE': ..., 'NAME': ...,
}, 'ibm': { 'ENGINE': ..., 'NAME': ..., } }

The `.using()` approach spinners = ( Spinner.objects .using(request.customer.name) .annotate( avg_duration=Avg('owned_spinners__spins__duration'))
.order_by('-avg_duration'))

The threadlocal middleware approach def multidb_middleware(get_response): def middleware(request): subdomain =
get_subdomain(request) customer = get_customer(subdomain) request.customer = customer @thread_local(using_db=customer.name) def execute_request(request): return get_response(request) response = execute_request(request) return response return middleware

The router class TenantRouter(object): def db_for_read(self, model, **hints): return get_thread_local('using_db',
'default') def db_for_write(self, model, **hints): return get_thread_local('using_db', 'default') # … # settings.py DATABASE_ROUTERS = ['multitenancy.routers.TenantRouter']

Querying spinners = ( Spinner.objects .using(request.customer.name) .annotate( avg_duration=Avg('owned_spinners__spins__duration')) .order_by('-avg_duration'))

Database Multitenancy vs. Application Multitenancy

Single Database Multiple Schemas

What are schemas in the first place? SELECT id, name
FROM user WHERE user.name LIKE 'F%';

What are schemas in the first place? CREATE SCHEMA ibm;
SELECT id, name FROM ibm.user WHERE ibm.user.name LIKE 'F%';

The `search_path` SET search_path TO ibm; SELECT id, name FROM
user WHERE user.name LIKE 'F%';

Django-tenant-schemas

Routing - middleware # ... connection.set_schema_to_public() hostname = self.hostname_from_request(request) TenantModel
= get_tenant_model() try: tenant = self.get_tenant(TenantModel, hostname, request) assert isinstance(tenant, TenantModel) except TenantModel.DoesNotExist: # ... request.tenant = tenant connection.set_tenant(request.tenant) # ...

Routing - settings MIDDLEWARE_CLASSES = [ 'tenant_schemas.middleware.TenantMiddleware', # … ]
DATABASES = { 'default': { 'ENGINE': 'tenant_schemas.postgresql_backend', 'NAME': 'mydb', } }

Routing - db backend # ... try: cursor_for_search_path.execute( 'SET search_path
= {0}'.format(','.join(search_paths))) except (django.db.utils.DatabaseError, psycopg2.InternalError): self.search_path_set = False else: self.search_path_set = True if name: cursor_for_search_path.close() # ...

The Command Line ./manage.py tenant_command shell ./manage.py createsuperuser ./manage.py migrate_schemas

Querying spinners = ( Spinner.objects .annotate( avg_duration=Avg('owned_spinners__spins__duration')) .order_by('-avg_duration'))

SELECT id, duration FROM ibm.spinner_spin WHERE duration > 120 UNION
SELECT id, duration FROM vinta.spinner_spin WHERE duration > 120; Querying across schemas

SELECT uuid, duration FROM ibm.spinner_spin WHERE duration > 120 UNION
SELECT uuid, duration FROM vinta.spinner_spin WHERE duration > 120; Querying across schemas

Upsides • Querying looks same as standard application • New
schemas created automatically • Knows how to handle migrations • Simpler infrastructure

Drawbacks • Be carefull with too many schemas (maybe not
more than 100's clients?) • Tests need some setup and might get slower • Harder to query across schemas

multitenancy is not discrete, it is a continuous spectrum

bit.ly/django-multitenancy github.com/filipeximenes/multitenancy

Obrigado! http://bit.ly/vinta2017 Newsletter: vinta.com.br/blog/ twitter.com/@xima github.com/filipeximenes [email protected]

Multitenant Applications: How and Why

Multitenant Applications: How and Why

More Decks by Filipe Ximenes

Other Decks in Programming

Featured

Transcript