How denormalizing our postgres turned great

How denormalizing our Postgres turned great Quentin de Metz, Pennylane

Pennylane Accounting and ﬁnancial management SaaS Founded in 2020 ~2000
customers ~40 engineers, of which ~30 ruby devs Ruby on rails, React, PostGRES, Heroku, AWS

Our data model 99% of data is partitioned by customer
(“company”) We dogfood our own product - all of our own ﬁnancial data is in our database Our customers’ ﬁnancial data is very sensitive - we need to be extra-careful that the partitioning is perfect

Our tooling Access control is managed via CanCanCan in a
single Ability ﬁle We leverage PostGRES row-level security (RLS) to restrict access to our own company’s data We ﬁlter and store PostGRES changelog data with AWS for auditing purposes

Best practice: normalize the data in PostGRES Advantages: - Avoid
data redundancy - Avoid incoherent data Pitfalls - ???

What this looks like in practice Database schema:

Ability class Ability def initialize(user) can :manage, Company, id: user.company_id
can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, invoice: { company_id: user.company_id } end end

sync_from_prod Devtool to export production data from a speciﬁc customer
to development environment > COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx); > COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx); > COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN ( SELECT id FROM invoices WHERE company_id = xxx ));

Data pipeline Plugged into PostGRES replication stream, stores row-level change
events We want to ﬁlter out data linked to Pennylane Complex logic when invoice_line changes

What if the dependency tree gets deeper?

Denormalizing for fun and proﬁt - Add company_id column to
(almost) every single table - Yes it’s redundant - Don’t panic

Denormalizing Advantages: - Very easy to ﬁnd out who can
access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company Pitfalls: - Data redundancy - Data consistency - PostGRES column correlations may hinder query planner performance

Ability class Ability def initialize(user) can :manage, Company, id: user.company_id
can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, company_id: user.company_id end end

Rails addons

Permanent attributes define_method(:"#{attribute}_not_changed") do if persisted? && has_attribute?(attribute) \ &&
!attribute_was(attribute).nil? && attribute_changed?(attribute) errors.add(attribute, :permanent) end end validate :"#{attribute}_not_changed"

Permanent attributes (2) class ApplicationRecord attr_permanent :company_id end class InvoiceLine
< ApplicationRecord attr_permanent :invoice_id end

Insecure Direct Object Reference PUT /invoices/4, { invoice: { supplier_id:
99 } } GET /invoices/4 It’s possible to attach invoices to another company’s suppliers 💣

Loading associations def last_chain_scope(scope, reflection, owner) return super unless owner.class.has_attribute?(:company_id)
\ && reflection.klass.has_attribute?(:company_id) apply_scope(super, reflection.aliased_table, :company_id, transform_value(owner[:company_id])) end (patch applied on ActiveRecord::Associations::AssociationScope)

Loading associations (2) > invoice.invoice_lines SELECT "invoice_lines".* FROM "invoice_lines" WHERE
"invoice_lines"."invoice_id" = 232650824 AND "invoice_lines"."company_id" = 21847875

Joining associations (2) > Invoice.joins(:invoice_lines) SELECT "invoices".* FROM "invoices" INNER
JOIN "invoice_lines" ON "invoice_lines"."invoice_id" = "invoices"."id" AND "invoice_lines"."company_id" = "invoices"."company_id"

Preloading associations def build_scope super.tap do |orig_scope| orig_scope.where!(company_id: @__company_id) \
if klass.has_attribute?(:company_id) && defined?(@__company_id) end end patch applied on ActiveRecord::Associations::Preloader::Association

Preloading associations (2) > Invoice.preload(:invoice_lines).take SELECT "invoices".* FROM "invoices" LIMIT
1 SELECT "invoice_lines".* FROM "invoice_lines" WHERE "invoice_lines"."company_id" = 21847875 AND "invoice_lines"."document_id" = 232650824

Where_exists (https://github.com/EugZol/where_exists) > Invoice.where_exists(:invoice_lines) SELECT "invoices".* FROM "invoices" WHERE ((EXISTS
( SELECT 1 FROM "invoice_lines" WHERE ("invoice_lines"."invoice_id" = "invoices"."id") AND "invoice_lines"."company_id" = "invoices"."company_id" )))

Building new entities > invoice.invoice_lines.build(amount: 1) => #<InvoiceLine:0x0000558668152518 id: nil,
invoice_id: 232650824, company_id: 21847875, ...>

Up next... Checking integrity before assigning relationships (can be done
in one simple query if data is denormalized) invoice.supplier_id = 42

Denormalizing Advantages: - Very easy to ﬁnd out who can
access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company - “Belt and suspenders” approach to IDOR Pitfalls: - Data redundancy isn’t such a big problem ✔ - Data consistency seems solved ✔ - PostGRES column correlations may hinder query planner performance (for another time)

How denormalizing our postgres turned great

How denormalizing our postgres turned great

Quentin de Metz

Other Decks in Technology

Featured

Transcript