Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How denormalizing our postgres turned great

How denormalizing our postgres turned great

B20b6d03e682202dc5d1307f921e7a79?s=128

Quentin de Metz

November 12, 2021
Tweet

Transcript

  1. How denormalizing our Postgres turned great Quentin de Metz, Pennylane

  2. Pennylane Accounting and financial management SaaS Founded in 2020 ~2000

    customers ~40 engineers, of which ~30 ruby devs Ruby on rails, React, PostGRES, Heroku, AWS
  3. Our data model 99% of data is partitioned by customer

    (“company”) We dogfood our own product - all of our own financial data is in our database Our customers’ financial data is very sensitive - we need to be extra-careful that the partitioning is perfect
  4. Our tooling Access control is managed via CanCanCan in a

    single Ability file We leverage PostGRES row-level security (RLS) to restrict access to our own company’s data We filter and store PostGRES changelog data with AWS for auditing purposes
  5. Best practice: normalize the data in PostGRES Advantages: - Avoid

    data redundancy - Avoid incoherent data Pitfalls - ???
  6. What this looks like in practice Database schema:

  7. Ability class Ability def initialize(user) can :manage, Company, id: user.company_id

    can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, invoice: { company_id: user.company_id } end end
  8. sync_from_prod Devtool to export production data from a specific customer

    to development environment > COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx); > COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx); > COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN ( SELECT id FROM invoices WHERE company_id = xxx ));
  9. Data pipeline Plugged into PostGRES replication stream, stores row-level change

    events We want to filter out data linked to Pennylane Complex logic when invoice_line changes
  10. What if the dependency tree gets deeper?

  11. Denormalizing for fun and profit - Add company_id column to

    (almost) every single table - Yes it’s redundant - Don’t panic
  12. Denormalizing Advantages: - Very easy to find out who can

    access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company Pitfalls: - Data redundancy - Data consistency - PostGRES column correlations may hinder query planner performance
  13. Ability class Ability def initialize(user) can :manage, Company, id: user.company_id

    can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, company_id: user.company_id end end
  14. Rails addons

  15. Permanent attributes define_method(:"#{attribute}_not_changed") do if persisted? && has_attribute?(attribute) \ &&

    !attribute_was(attribute).nil? && attribute_changed?(attribute) errors.add(attribute, :permanent) end end validate :"#{attribute}_not_changed"
  16. Permanent attributes (2) class ApplicationRecord attr_permanent :company_id end class InvoiceLine

    < ApplicationRecord attr_permanent :invoice_id end
  17. Insecure Direct Object Reference PUT /invoices/4, { invoice: { supplier_id:

    99 } } GET /invoices/4 It’s possible to attach invoices to another company’s suppliers 💣
  18. Loading associations def last_chain_scope(scope, reflection, owner) return super unless owner.class.has_attribute?(:company_id)

    \ && reflection.klass.has_attribute?(:company_id) apply_scope(super, reflection.aliased_table, :company_id, transform_value(owner[:company_id])) end (patch applied on ActiveRecord::Associations::AssociationScope)
  19. Loading associations (2) > invoice.invoice_lines SELECT "invoice_lines".* FROM "invoice_lines" WHERE

    "invoice_lines"."invoice_id" = 232650824 AND "invoice_lines"."company_id" = 21847875
  20. Joining associations (2) > Invoice.joins(:invoice_lines) SELECT "invoices".* FROM "invoices" INNER

    JOIN "invoice_lines" ON "invoice_lines"."invoice_id" = "invoices"."id" AND "invoice_lines"."company_id" = "invoices"."company_id"
  21. Preloading associations def build_scope super.tap do |orig_scope| orig_scope.where!(company_id: @__company_id) \

    if klass.has_attribute?(:company_id) && defined?(@__company_id) end end patch applied on ActiveRecord::Associations::Preloader::Association
  22. Preloading associations (2) > Invoice.preload(:invoice_lines).take SELECT "invoices".* FROM "invoices" LIMIT

    1 SELECT "invoice_lines".* FROM "invoice_lines" WHERE "invoice_lines"."company_id" = 21847875 AND "invoice_lines"."document_id" = 232650824
  23. Where_exists (https://github.com/EugZol/where_exists) > Invoice.where_exists(:invoice_lines) SELECT "invoices".* FROM "invoices" WHERE ((EXISTS

    ( SELECT 1 FROM "invoice_lines" WHERE ("invoice_lines"."invoice_id" = "invoices"."id") AND "invoice_lines"."company_id" = "invoices"."company_id" )))
  24. Building new entities > invoice.invoice_lines.build(amount: 1) => #<InvoiceLine:0x0000558668152518 id: nil,

    invoice_id: 232650824, company_id: 21847875, ...>
  25. Up next... Checking integrity before assigning relationships (can be done

    in one simple query if data is denormalized) invoice.supplier_id = 42
  26. Denormalizing Advantages: - Very easy to find out who can

    access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company - “Belt and suspenders” approach to IDOR Pitfalls: - Data redundancy isn’t such a big problem ✔ - Data consistency seems solved ✔ - PostGRES column correlations may hinder query planner performance (for another time)
  27. None