Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How denormalizing our postgres turned great

How denormalizing our postgres turned great

Quentin de Metz

November 12, 2021
Tweet

Other Decks in Technology

Transcript

  1. How denormalizing our Postgres
    turned great
    Quentin de Metz, Pennylane

    View Slide

  2. Pennylane
    Accounting and financial management SaaS
    Founded in 2020
    ~2000 customers
    ~40 engineers, of which ~30 ruby devs
    Ruby on rails, React, PostGRES, Heroku, AWS

    View Slide

  3. Our data model
    99% of data is partitioned by customer (“company”)
    We dogfood our own product - all of our own financial data is in our database
    Our customers’ financial data is very sensitive - we need to be extra-careful that the
    partitioning is perfect

    View Slide

  4. Our tooling
    Access control is managed via CanCanCan in a single Ability file
    We leverage PostGRES row-level security (RLS) to restrict access to our own
    company’s data
    We filter and store PostGRES changelog data with AWS for auditing purposes

    View Slide

  5. Best practice: normalize the data in PostGRES
    Advantages:
    - Avoid data redundancy
    - Avoid incoherent data
    Pitfalls
    - ???

    View Slide

  6. What this looks like in practice
    Database schema:

    View Slide

  7. Ability
    class Ability
    def initialize(user)
    can :manage, Company, id: user.company_id
    can :manage, Invoice, company_id: user.company_id
    can :manage, InvoiceLine, invoice: { company_id:
    user.company_id }
    end
    end

    View Slide

  8. sync_from_prod
    Devtool to export production data from a specific customer to development
    environment
    > COPY TO STDOUT(SELECT * FROM companies WHERE id = xxx);
    > COPY TO STDOUT(SELECT * FROM invoices WHERE company_id = xxx);
    > COPY TO STDOUT(SELECT * FROM invoice_lines WHERE invoice_id IN (
    SELECT id FROM invoices WHERE company_id = xxx
    ));

    View Slide

  9. Data pipeline
    Plugged into PostGRES replication stream, stores row-level change events
    We want to filter out data linked to Pennylane
    Complex logic when invoice_line changes

    View Slide

  10. What if the dependency tree gets deeper?

    View Slide

  11. Denormalizing for fun and profit
    - Add company_id column to (almost) every single table
    - Yes it’s redundant
    - Don’t panic

    View Slide

  12. Denormalizing
    Advantages:
    - Very easy to find out who can access what (just look at company_id)
    - Consistency checking: it takes one JOIN to check that an invoice_line is
    attached to the correct company
    Pitfalls:
    - Data redundancy
    - Data consistency
    - PostGRES column correlations may hinder query planner performance

    View Slide

  13. Ability
    class Ability
    def initialize(user)
    can :manage, Company, id: user.company_id
    can :manage, Invoice, company_id: user.company_id
    can :manage, InvoiceLine, company_id: user.company_id
    end
    end

    View Slide

  14. Rails addons

    View Slide

  15. Permanent attributes
    define_method(:"#{attribute}_not_changed") do
    if persisted? && has_attribute?(attribute) \
    && !attribute_was(attribute).nil? && attribute_changed?(attribute)
    errors.add(attribute, :permanent)
    end
    end
    validate :"#{attribute}_not_changed"

    View Slide

  16. Permanent attributes (2)
    class ApplicationRecord
    attr_permanent :company_id
    end
    class InvoiceLine < ApplicationRecord
    attr_permanent :invoice_id
    end

    View Slide

  17. Insecure Direct Object Reference
    PUT /invoices/4, { invoice: { supplier_id: 99 } }
    GET /invoices/4
    It’s possible to attach invoices to another company’s suppliers 💣

    View Slide

  18. Loading associations
    def last_chain_scope(scope, reflection, owner)
    return super unless owner.class.has_attribute?(:company_id) \
    && reflection.klass.has_attribute?(:company_id)
    apply_scope(super, reflection.aliased_table,
    :company_id, transform_value(owner[:company_id]))
    end
    (patch applied on ActiveRecord::Associations::AssociationScope)

    View Slide

  19. Loading associations (2)
    > invoice.invoice_lines
    SELECT "invoice_lines".*
    FROM "invoice_lines"
    WHERE "invoice_lines"."invoice_id" = 232650824
    AND "invoice_lines"."company_id" = 21847875

    View Slide

  20. Joining associations (2)
    > Invoice.joins(:invoice_lines)
    SELECT "invoices".* FROM "invoices"
    INNER JOIN "invoice_lines"
    ON "invoice_lines"."invoice_id" = "invoices"."id"
    AND "invoice_lines"."company_id" = "invoices"."company_id"

    View Slide

  21. Preloading associations
    def build_scope
    super.tap do |orig_scope|
    orig_scope.where!(company_id: @__company_id) \
    if klass.has_attribute?(:company_id) &&
    defined?(@__company_id)
    end
    end
    patch applied on ActiveRecord::Associations::Preloader::Association

    View Slide

  22. Preloading associations (2)
    > Invoice.preload(:invoice_lines).take
    SELECT "invoices".* FROM "invoices" LIMIT 1
    SELECT "invoice_lines".*
    FROM "invoice_lines"
    WHERE "invoice_lines"."company_id" = 21847875
    AND "invoice_lines"."document_id" = 232650824

    View Slide

  23. Where_exists (https://github.com/EugZol/where_exists)
    > Invoice.where_exists(:invoice_lines)
    SELECT "invoices".* FROM "invoices" WHERE ((EXISTS (
    SELECT 1 FROM "invoice_lines"
    WHERE ("invoice_lines"."invoice_id" = "invoices"."id")
    AND "invoice_lines"."company_id" = "invoices"."company_id"
    )))

    View Slide

  24. Building new entities
    > invoice.invoice_lines.build(amount: 1)
    => #id: nil,
    invoice_id: 232650824,
    company_id: 21847875,
    ...>

    View Slide

  25. Up next...
    Checking integrity before assigning relationships (can be done in one simple query
    if data is denormalized)
    invoice.supplier_id = 42

    View Slide

  26. Denormalizing
    Advantages:
    - Very easy to find out who can access what (just look at company_id)
    - Consistency checking: it takes one JOIN to check that an invoice_line is attached to the
    correct company
    - “Belt and suspenders” approach to IDOR
    Pitfalls:
    - Data redundancy isn’t such a big problem ✔
    - Data consistency seems solved ✔
    - PostGRES column correlations may hinder query planner performance (for another time)

    View Slide

  27. View Slide