How denormalizing our postgres turned great

Slide 1

Slide 1 text

How denormalizing our Postgres turned great Quentin de Metz, Pennylane

Slide 2

Slide 2 text

Pennylane Accounting and ﬁnancial management SaaS Founded in 2020 ~2000 customers ~40 engineers, of which ~30 ruby devs Ruby on rails, React, PostGRES, Heroku, AWS

Slide 3

Slide 3 text

Our data model 99% of data is partitioned by customer (“company”) We dogfood our own product - all of our own ﬁnancial data is in our database Our customers’ ﬁnancial data is very sensitive - we need to be extra-careful that the partitioning is perfect

Slide 4

Slide 4 text

Our tooling Access control is managed via CanCanCan in a single Ability ﬁle We leverage PostGRES row-level security (RLS) to restrict access to our own company’s data We ﬁlter and store PostGRES changelog data with AWS for auditing purposes

Slide 5

Slide 5 text

Best practice: normalize the data in PostGRES Advantages: - Avoid data redundancy - Avoid incoherent data Pitfalls - ???

Slide 6

Slide 6 text

What this looks like in practice Database schema:

Slide 7

Slide 7 text

Ability class Ability def initialize(user) can :manage, Company, id: user.company_id can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, invoice: { company_id: user.company_id } end end

Slide 8

Slide 8 text

sync_from_prod Devtool to export production data from a speciﬁc customer to development environment > COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx); > COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx); > COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN ( SELECT id FROM invoices WHERE company_id = xxx ));

Slide 9

Slide 9 text

Data pipeline Plugged into PostGRES replication stream, stores row-level change events We want to ﬁlter out data linked to Pennylane Complex logic when invoice_line changes

Slide 10

Slide 10 text

What if the dependency tree gets deeper?

Slide 11

Slide 11 text

Denormalizing for fun and proﬁt - Add company_id column to (almost) every single table - Yes it’s redundant - Don’t panic

Slide 12

Slide 12 text

Denormalizing Advantages: - Very easy to ﬁnd out who can access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company Pitfalls: - Data redundancy - Data consistency - PostGRES column correlations may hinder query planner performance

Slide 13

Slide 13 text

Ability class Ability def initialize(user) can :manage, Company, id: user.company_id can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, company_id: user.company_id end end

Slide 14

Slide 14 text

Rails addons

Slide 15

Slide 15 text

Permanent attributes define_method(:"#{attribute}_not_changed") do if persisted? && has_attribute?(attribute) \ && !attribute_was(attribute).nil? && attribute_changed?(attribute) errors.add(attribute, :permanent) end end validate :"#{attribute}_not_changed"

Slide 16

Slide 16 text

Permanent attributes (2) class ApplicationRecord attr_permanent :company_id end class InvoiceLine < ApplicationRecord attr_permanent :invoice_id end

Slide 17

Slide 17 text

Insecure Direct Object Reference PUT /invoices/4, { invoice: { supplier_id: 99 } } GET /invoices/4 It’s possible to attach invoices to another company’s suppliers 💣

Slide 18

Slide 18 text

Loading associations def last_chain_scope(scope, reflection, owner) return super unless owner.class.has_attribute?(:company_id) \ && reflection.klass.has_attribute?(:company_id) apply_scope(super, reflection.aliased_table, :company_id, transform_value(owner[:company_id])) end (patch applied on ActiveRecord::Associations::AssociationScope)

Slide 19

Slide 19 text

Loading associations (2) > invoice.invoice_lines SELECT "invoice_lines".* FROM "invoice_lines" WHERE "invoice_lines"."invoice_id" = 232650824 AND "invoice_lines"."company_id" = 21847875

Slide 20

Slide 20 text

Joining associations (2) > Invoice.joins(:invoice_lines) SELECT "invoices".* FROM "invoices" INNER JOIN "invoice_lines" ON "invoice_lines"."invoice_id" = "invoices"."id" AND "invoice_lines"."company_id" = "invoices"."company_id"

Slide 21

Slide 21 text

Preloading associations def build_scope super.tap do |orig_scope| orig_scope.where!(company_id: @__company_id) \ if klass.has_attribute?(:company_id) && defined?(@__company_id) end end patch applied on ActiveRecord::Associations::Preloader::Association

Slide 22

Slide 22 text

Preloading associations (2) > Invoice.preload(:invoice_lines).take SELECT "invoices".* FROM "invoices" LIMIT 1 SELECT "invoice_lines".* FROM "invoice_lines" WHERE "invoice_lines"."company_id" = 21847875 AND "invoice_lines"."document_id" = 232650824

Slide 23

Slide 23 text

Where_exists (https://github.com/EugZol/where_exists) > Invoice.where_exists(:invoice_lines) SELECT "invoices".* FROM "invoices" WHERE ((EXISTS ( SELECT 1 FROM "invoice_lines" WHERE ("invoice_lines"."invoice_id" = "invoices"."id") AND "invoice_lines"."company_id" = "invoices"."company_id" )))

Slide 24

Slide 24 text

Building new entities > invoice.invoice_lines.build(amount: 1) => #

Slide 25

Slide 25 text

Up next... Checking integrity before assigning relationships (can be done in one simple query if data is denormalized) invoice.supplier_id = 42

Slide 26

Slide 26 text

Denormalizing Advantages: - Very easy to ﬁnd out who can access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company - “Belt and suspenders” approach to IDOR Pitfalls: - Data redundancy isn’t such a big problem ✔ - Data consistency seems solved ✔ - PostGRES column correlations may hinder query planner performance (for another time)

Slide 27

Slide 27 text

No content