How denormalizing our Postgres
turned great
Quentin de Metz, Pennylane
Slide 2
Slide 2 text
Pennylane
Accounting and financial management SaaS
Founded in 2020
~2000 customers
~40 engineers, of which ~30 ruby devs
Ruby on rails, React, PostGRES, Heroku, AWS
Slide 3
Slide 3 text
Our data model
99% of data is partitioned by customer (“company”)
We dogfood our own product - all of our own financial data is in our database
Our customers’ financial data is very sensitive - we need to be extra-careful that the
partitioning is perfect
Slide 4
Slide 4 text
Our tooling
Access control is managed via CanCanCan in a single Ability file
We leverage PostGRES row-level security (RLS) to restrict access to our own
company’s data
We filter and store PostGRES changelog data with AWS for auditing purposes
Slide 5
Slide 5 text
Best practice: normalize the data in PostGRES
Advantages:
- Avoid data redundancy
- Avoid incoherent data
Pitfalls
- ???
Slide 6
Slide 6 text
What this looks like in practice
Database schema:
Slide 7
Slide 7 text
Ability
class Ability
def initialize(user)
can :manage, Company, id: user.company_id
can :manage, Invoice, company_id: user.company_id
can :manage, InvoiceLine, invoice: { company_id:
user.company_id }
end
end
Slide 8
Slide 8 text
sync_from_prod
Devtool to export production data from a specific customer to development
environment
> COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx);
> COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx);
> COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN (
SELECT id FROM invoices WHERE company_id = xxx
));
Slide 9
Slide 9 text
Data pipeline
Plugged into PostGRES replication stream, stores row-level change events
We want to filter out data linked to Pennylane
Complex logic when invoice_line changes
Slide 10
Slide 10 text
What if the dependency tree gets deeper?
Slide 11
Slide 11 text
Denormalizing for fun and profit
- Add company_id column to (almost) every single table
- Yes it’s redundant
- Don’t panic
Slide 12
Slide 12 text
Denormalizing
Advantages:
- Very easy to find out who can access what (just look at company_id)
- Consistency checking: it takes one JOIN to check that an invoice_line is
attached to the correct company
Pitfalls:
- Data redundancy
- Data consistency
- PostGRES column correlations may hinder query planner performance
Slide 13
Slide 13 text
Ability
class Ability
def initialize(user)
can :manage, Company, id: user.company_id
can :manage, Invoice, company_id: user.company_id
can :manage, InvoiceLine, company_id: user.company_id
end
end
Slide 14
Slide 14 text
Rails addons
Slide 15
Slide 15 text
Permanent attributes
define_method(:"#{attribute}_not_changed") do
if persisted? && has_attribute?(attribute) \
&& !attribute_was(attribute).nil? && attribute_changed?(attribute)
errors.add(attribute, :permanent)
end
end
validate :"#{attribute}_not_changed"
Slide 16
Slide 16 text
Permanent attributes (2)
class ApplicationRecord
attr_permanent :company_id
end
class InvoiceLine < ApplicationRecord
attr_permanent :invoice_id
end
Slide 17
Slide 17 text
Insecure Direct Object Reference
PUT /invoices/4, { invoice: { supplier_id: 99 } }
GET /invoices/4
It’s possible to attach invoices to another company’s suppliers 💣
Slide 18
Slide 18 text
Loading associations
def last_chain_scope(scope, reflection, owner)
return super unless owner.class.has_attribute?(:company_id) \
&& reflection.klass.has_attribute?(:company_id)
apply_scope(super, reflection.aliased_table,
:company_id, transform_value(owner[:company_id]))
end
(patch applied on ActiveRecord::Associations::AssociationScope)
Slide 19
Slide 19 text
Loading associations (2)
> invoice.invoice_lines
SELECT "invoice_lines".*
FROM "invoice_lines"
WHERE "invoice_lines"."invoice_id" = 232650824
AND "invoice_lines"."company_id" = 21847875
Slide 20
Slide 20 text
Joining associations (2)
> Invoice.joins(:invoice_lines)
SELECT "invoices".* FROM "invoices"
INNER JOIN "invoice_lines"
ON "invoice_lines"."invoice_id" = "invoices"."id"
AND "invoice_lines"."company_id" = "invoices"."company_id"
Slide 21
Slide 21 text
Preloading associations
def build_scope
super.tap do |orig_scope|
orig_scope.where!(company_id: @__company_id) \
if klass.has_attribute?(:company_id) &&
defined?(@__company_id)
end
end
patch applied on ActiveRecord::Associations::Preloader::Association
Slide 22
Slide 22 text
Preloading associations (2)
> Invoice.preload(:invoice_lines).take
SELECT "invoices".* FROM "invoices" LIMIT 1
SELECT "invoice_lines".*
FROM "invoice_lines"
WHERE "invoice_lines"."company_id" = 21847875
AND "invoice_lines"."document_id" = 232650824
Slide 23
Slide 23 text
Where_exists (https://github.com/EugZol/where_exists)
> Invoice.where_exists(:invoice_lines)
SELECT "invoices".* FROM "invoices" WHERE ((EXISTS (
SELECT 1 FROM "invoice_lines"
WHERE ("invoice_lines"."invoice_id" = "invoices"."id")
AND "invoice_lines"."company_id" = "invoices"."company_id"
)))
Slide 24
Slide 24 text
Building new entities
> invoice.invoice_lines.build(amount: 1)
=> #
Slide 25
Slide 25 text
Up next...
Checking integrity before assigning relationships (can be done in one simple query
if data is denormalized)
invoice.supplier_id = 42
Slide 26
Slide 26 text
Denormalizing
Advantages:
- Very easy to find out who can access what (just look at company_id)
- Consistency checking: it takes one JOIN to check that an invoice_line is attached to the
correct company
- “Belt and suspenders” approach to IDOR
Pitfalls:
- Data redundancy isn’t such a big problem ✔
- Data consistency seems solved ✔
- PostGRES column correlations may hinder query planner performance (for another time)