Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Denormalize All the Things!

Denormalize All the Things!

What do you do when you need to cache a model but also need to query the cached data? You denormalize all the things! In this talk (presented at DjangoCon US 2013) we will see how to create and maintain denormalized versions of some of your Django models in order to enjoy the benefits of caching while maintaining the benefits of relational data.

Nathan Duthoit

September 05, 2013
Tweet

More Decks by Nathan Duthoit

Other Decks in Programming

Transcript

  1. @nduthoit 9 Invoice Invoice Item - customer - date -

    number .... - description - product - quantity - price - tax(es) .... - total_amount() - total_amount() data models Thursday, 5 September, 13
  2. @nduthoit 10 1 class Invoice(models.Model): 2 ... 3 def total_amount(self):

    4 total_amount = Decimal('0.0') 5 for invoice_item in self.invoiceitem_set.all(): 6 total_amount += invoice_item.amount() 7 return total_amount @nduthoit computed fields Thursday, 5 September, 13
  3. @nduthoit 13 1 class Invoice(models.Model): 2 ... 3 def total_amount(self):

    4 total_amount = cache.get(self.total_amount_cache_key(), 5 'NOT_IN_CACHE') 6 7 if total_amount == 'NOT_IN_CACHE': 8 total_amount = self.compute_total_amount() # O(n * m) 9 cache.set(self.total_amount_cache_key(), total_amount) 10 11 return total_amount cache it!? Thursday, 5 September, 13
  4. 1 class Invoice(models.Model): 2 ... 3 total_amount = models.DecimalField(max_digits=12, 4

    decimal_places=2) 5 ... 6 7 # total_amount update magic goes here @nduthoit 16 1 qset = Invoice.objects.filter( 2 total_amount__gte=Decimal('2000.00') 3 ) denormalize! @nduthoit Thursday, 5 September, 13
  5. @nduthoit 20 Accountant Invoice Payments Bill Income Expense Transfer Business

    owner Journal Entry Debit Credit Thursday, 5 September, 13
  6. @nduthoit 21 generate on the fly? 1 class Invoice(...): 2

    ... 3 def generate_account_transaction(self): 4 ... 5 # Accounting magic to generate CREDITs and DEBITs 6 ... 7 return account_transactions Thursday, 5 September, 13
  7. @nduthoit 22 generate on the fly? • doesn’t  scale •

    can’t  query  a  small  set  of  the  data  without  re-­‐ calculaJng  everything Thursday, 5 September, 13
  8. @nduthoit 23 denormalize! 1 class JournalEntry(...): 2 ... 3 entry_date

    = models.DateField() 4 description = models.TextField() 5 ... 6 # Link back to invoice, payment, income, ... 7 source = models.CharField(...) 8 source_pk = models.CharField(...) 9 ... 10 11 12 class AccountTransaction(...): 13 ... 14 journal_entry = models.ForeignKey('JournalEntry') 15 account = models.ForeignKey('Account') 16 debit_amount = models.DecimalField(...) 17 credit_amount = models.DecimalField(...) 18 description = models.TextField(...) 19 ... Thursday, 5 September, 13
  9. @nduthoit 24 denormalize! 1 class Invoice(...): 2 ... 3 def

    save(self, *args, **kwargs): 4 super(Invoice, self).save(*args, **kwargs) 5 # Create or update journal entries, debits and credits 6 self.update_bookkeeping() 7 ... Thursday, 5 September, 13
  10. @nduthoit 27 combine models on the fly? ... incomes =

    Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13
  11. @nduthoit 28 combine models on the fly? ... incomes =

    Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13
  12. @nduthoit 29 combine models on the fly? ... incomes =

    Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13
  13. @nduthoit 32 class FinancialTransaction(models.Model): ... payment_account_pk = models.ForeignKey(...) income_expense_account =

    models.ForeignKey(...) amount = models.DecimalField(...) currency_code = models.CharField(...) transaction_date = models.DateField(...) description = models.CharField(...) ... denormalize! Thursday, 5 September, 13
  14. @nduthoit 34 canonical data • define  the  authoritaJve  source(s)  of

     data • maintain  your  denormalized  data  in  sync  (one-­‐way) • never  modify  the  denormalized  instances  manually Thursday, 5 September, 13
  15. @nduthoit 36 canonical data (know thy parent) • have  a

     way  to  uniquely  idenJfy  and  get  the  canonical   data  source  (kind  of  like  a  generic  foreign  key): • canonical_source • canonical_pk • canonical_shard_key • makes  it  easy  to: • detect  data  consistency  issues • re-­‐create  as  required Thursday, 5 September, 13
  16. @nduthoit 37 class FinancialTransaction(models.Model): ... # Canonical Info business_pk =

    models.PositiveIntegerField(db_index=True) canonical_pk = models.PositiveIntegerField(db_index=True) canonical_type = models.CharField( max_length=20, choices=FinancialTransactionCanonicalTypes.CHOICES, db_index=True ) ... canonical data (know thy parent) Thursday, 5 September, 13
  17. @nduthoit 39 Propagating changes: manually class Income(...): ... def save(self,

    *args, **kwargs): super(Income, self).save(*args, **kwargs) ft_data = self.get_financial_transaction_data() create_or_update_financial_transaction(ft_data) ... Thursday, 5 September, 13
  18. @nduthoit 40 Propagating changes: “use the signals Luke” 1 @receiver(post_save,

    dispatch_uid="ft_producer_post_save") 2 def handle_ft_producer_post_save(sender, instance, created, 3 **kwargs): 4 if issubclass(sender, FinancialTransactionProducerMixin): 5 ft_data = instance.get_financial_transaction_data() 6 create_or_update_financial_transaction(ft_data) 7 8 ... 9 class FinancialTransactionProducerMixin(models.Model): 10 class Meta: 11 abstract = True 12 ... Thursday, 5 September, 13
  19. @nduthoit 41 class Income(FinancialTransactionProducerMixin): ... def get_financial_transaction_data(self): ... class Expense(FinancialTransactionProducerMixin):

    ... def get_financial_transaction_data(self): ... class InvoicePayment(FinancialTransactionProducerMixin): ... def get_financial_transaction_data(self): ... Propagating changes: “use the signals Luke” Thursday, 5 September, 13
  20. @nduthoit 42 Propagating changes: “use the signals Luke” • can’t

     use:  qset.update(),  qset.create_bulk(),   qset.delete(),  ... • error  handling  and  transacJon  management  is   tricky Thursday, 5 September, 13
  21. @nduthoit 44 Payment Income Expense Transfer Financial Transaction views, browser,

    user Propagating changes: backwards? Thursday, 5 September, 13
  22. @nduthoit 45 Financial Transaction views, browser, user Propagating changes: backwards?

    interface Payment Income Expense Transfer Thursday, 5 September, 13
  23. @nduthoit 47 change  price   to  $10 update bookkeeing change

     price   to  $10.5 update bookkeeing concurrency issues Thursday, 5 September, 13
  24. @nduthoit 48 change  price   to  $10 change  price  

    to  $10.5 concurrency issues update bookkeeping update bookkeeping Thursday, 5 September, 13
  25. @nduthoit 49 one at a time mutexes,  locks from retools.lock

    import Lock with Lock('a_key'): # do something that should only be done one at a time Thursday, 5 September, 13
  26. @nduthoit 50 1 class JournalEntry(...): 2 ... 3 entry_date =

    models.DateField() 4 description = models.TextField() 5 ... 6 # Link back to invoice, payment, income, ... 7 source = models.CharField(...) 8 source_pk = models.CharField(...) 9 ... 10 11 12 class AccountTransaction(...): 13 ... 14 journal_entry = models.ForeignKey('JournalEntry') 15 account = models.ForeignKey('Account') 16 debit_amount = models.DecimalField(...) 17 credit_amount = models.DecimalField(...) 18 description = models.TextField(...) 19 ... one at a time Thursday, 5 September, 13
  27. @nduthoit 51 one at a time pylock h#ps://github.com/waveaccounJng/pylock h#ps://pypi.python.org/pypi/pylock 1

    def delete_journal_entry(cls, journal_entry): 2 ... 3 with JournalEntryLock(journal_entry.source, 4 journal_entry.source_pk): 5 # remove existing AccountTransactions 6 AccountTransaction.objects.using(shard).filter( 7 journal_entry=journal_entry 8 ).delete() 9 journal_entry.delete() 10 ... Thursday, 5 September, 13
  28. @nduthoit 52 measure! •you  can’t  improve  what  you  can’t  measure

    •recipe: 1. detect/measure 2. fix  one  thing 3. repeat Thursday, 5 September, 13
  29. @nduthoit 53 1 class NonLockingRedisLock(RedisLock): 2 def __init__(self, key, expires,

    timeout, client): 3 self.locked = False 4 timeout = 0 # no timeout 5 super(NonLockingRedisLock, self).__init__(key, expires, 6 timeout, client) 7 8 def acquire(self): 9 try: 10 super(NonLockingRedisLock, self).acquire() 11 self.locked = True 12 except LockTimeout: 13 self.locked = False 14 statsd.incr('interfaces.lock.timeout') 15 logger.warning("LockTimeout", exc_info=full_exc_info()) 16 except RedisError: 17 self.enabled = False 18 logger.warning("Lock.acquire RedisError", 19 exc_info=full_exc_info()) 20 statsd.incr('interfaces.lock.RedisError') 21 ... measure! pylock Thursday, 5 September, 13
  30. @nduthoit 55 change  price   to  $10 update bookkeeing change

     price   to  $10.5 update bookkeeing concurrency issues ordering Thursday, 5 September, 13
  31. @nduthoit 56 special thanks • Zuzel  Vera • Michael  WarkenJn

    • Tommy  Guy • Ash  Christopher • Mike  Ivanov Thursday, 5 September, 13