Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How denormalizing our postgres turned great

How denormalizing our postgres turned great

Quentin de Metz

November 12, 2021
Tweet

Other Decks in Technology

Transcript

  1. Pennylane Accounting and financial management SaaS Founded in 2020 ~2000

    customers ~40 engineers, of which ~30 ruby devs Ruby on rails, React, PostGRES, Heroku, AWS
  2. Our data model 99% of data is partitioned by customer

    (“company”) We dogfood our own product - all of our own financial data is in our database Our customers’ financial data is very sensitive - we need to be extra-careful that the partitioning is perfect
  3. Our tooling Access control is managed via CanCanCan in a

    single Ability file We leverage PostGRES row-level security (RLS) to restrict access to our own company’s data We filter and store PostGRES changelog data with AWS for auditing purposes
  4. Best practice: normalize the data in PostGRES Advantages: - Avoid

    data redundancy - Avoid incoherent data Pitfalls - ???
  5. Ability class Ability def initialize(user) can :manage, Company, id: user.company_id

    can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, invoice: { company_id: user.company_id } end end
  6. sync_from_prod Devtool to export production data from a specific customer

    to development environment > COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx); > COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx); > COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN ( SELECT id FROM invoices WHERE company_id = xxx ));
  7. Data pipeline Plugged into PostGRES replication stream, stores row-level change

    events We want to filter out data linked to Pennylane Complex logic when invoice_line changes
  8. Denormalizing for fun and profit - Add company_id column to

    (almost) every single table - Yes it’s redundant - Don’t panic
  9. Denormalizing Advantages: - Very easy to find out who can

    access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company Pitfalls: - Data redundancy - Data consistency - PostGRES column correlations may hinder query planner performance
  10. Ability class Ability def initialize(user) can :manage, Company, id: user.company_id

    can :manage, Invoice, company_id: user.company_id can :manage, InvoiceLine, company_id: user.company_id end end
  11. Permanent attributes define_method(:"#{attribute}_not_changed") do if persisted? && has_attribute?(attribute) \ &&

    !attribute_was(attribute).nil? && attribute_changed?(attribute) errors.add(attribute, :permanent) end end validate :"#{attribute}_not_changed"
  12. Insecure Direct Object Reference PUT /invoices/4, { invoice: { supplier_id:

    99 } } GET /invoices/4 It’s possible to attach invoices to another company’s suppliers 💣
  13. Loading associations def last_chain_scope(scope, reflection, owner) return super unless owner.class.has_attribute?(:company_id)

    \ && reflection.klass.has_attribute?(:company_id) apply_scope(super, reflection.aliased_table, :company_id, transform_value(owner[:company_id])) end (patch applied on ActiveRecord::Associations::AssociationScope)
  14. Loading associations (2) > invoice.invoice_lines SELECT "invoice_lines".* FROM "invoice_lines" WHERE

    "invoice_lines"."invoice_id" = 232650824 AND "invoice_lines"."company_id" = 21847875
  15. Joining associations (2) > Invoice.joins(:invoice_lines) SELECT "invoices".* FROM "invoices" INNER

    JOIN "invoice_lines" ON "invoice_lines"."invoice_id" = "invoices"."id" AND "invoice_lines"."company_id" = "invoices"."company_id"
  16. Preloading associations def build_scope super.tap do |orig_scope| orig_scope.where!(company_id: @__company_id) \

    if klass.has_attribute?(:company_id) && defined?(@__company_id) end end patch applied on ActiveRecord::Associations::Preloader::Association
  17. Preloading associations (2) > Invoice.preload(:invoice_lines).take SELECT "invoices".* FROM "invoices" LIMIT

    1 SELECT "invoice_lines".* FROM "invoice_lines" WHERE "invoice_lines"."company_id" = 21847875 AND "invoice_lines"."document_id" = 232650824
  18. Where_exists (https://github.com/EugZol/where_exists) > Invoice.where_exists(:invoice_lines) SELECT "invoices".* FROM "invoices" WHERE ((EXISTS

    ( SELECT 1 FROM "invoice_lines" WHERE ("invoice_lines"."invoice_id" = "invoices"."id") AND "invoice_lines"."company_id" = "invoices"."company_id" )))
  19. Up next... Checking integrity before assigning relationships (can be done

    in one simple query if data is denormalized) invoice.supplier_id = 42
  20. Denormalizing Advantages: - Very easy to find out who can

    access what (just look at company_id) - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the correct company - “Belt and suspenders” approach to IDOR Pitfalls: - Data redundancy isn’t such a big problem ✔ - Data consistency seems solved ✔ - PostGRES column correlations may hinder query planner performance (for another time)