Slide 1

Slide 1 text

@nduthoit Denormalize All The Things! Nathan Duthoit Thursday, 5 September, 13

Slide 2

Slide 2 text

@nduthoit 2 About me Nathan Duthoit h#p://www.nick.com/shows/penguins-­‐of-­‐madagascar/characters/kowalski.html nduthoit.com github.com/nduthoit [email protected] h1ps://speakerdeck.com/nduthoit/ denormalize-­‐all-­‐the-­‐things Thursday, 5 September, 13

Slide 3

Slide 3 text

@nduthoit Denormalize All The Things? Nathan Duthoit Thursday, 5 September, 13

Slide 4

Slide 4 text

@nduthoit 4 be pragmatic Thursday, 5 September, 13

Slide 5

Slide 5 text

@nduthoit 5 Denormalized Models Use Cases Thursday, 5 September, 13

Slide 6

Slide 6 text

@nduthoit 6 waveapps.com Thursday, 5 September, 13

Slide 7

Slide 7 text

@nduthoit 7 1. Pre-Computed Data @nduthoit Thursday, 5 September, 13

Slide 8

Slide 8 text

@nduthoit 8 Invoice Item Invoice Item Invoice Item Invoice Payment data models Thursday, 5 September, 13

Slide 9

Slide 9 text

@nduthoit 9 Invoice Invoice Item - customer - date - number .... - description - product - quantity - price - tax(es) .... - total_amount() - total_amount() data models Thursday, 5 September, 13

Slide 10

Slide 10 text

@nduthoit 10 1 class Invoice(models.Model): 2 ... 3 def total_amount(self): 4 total_amount = Decimal('0.0') 5 for invoice_item in self.invoiceitem_set.all(): 6 total_amount += invoice_item.amount() 7 return total_amount @nduthoit computed fields Thursday, 5 September, 13

Slide 11

Slide 11 text

@nduthoit 11 computed fields O(n) queries O(n*m) queries Thursday, 5 September, 13

Slide 12

Slide 12 text

@nduthoit 12 @nduthoit h#p://fc08.deviantart.net/fs70/f/2010/149/a/7/Sad_Panda_Chibi_by_mongrelssister.png Thursday, 5 September, 13

Slide 13

Slide 13 text

@nduthoit 13 1 class Invoice(models.Model): 2 ... 3 def total_amount(self): 4 total_amount = cache.get(self.total_amount_cache_key(), 5 'NOT_IN_CACHE') 6 7 if total_amount == 'NOT_IN_CACHE': 8 total_amount = self.compute_total_amount() # O(n * m) 9 cache.set(self.total_amount_cache_key(), total_amount) 10 11 return total_amount cache it!? Thursday, 5 September, 13

Slide 14

Slide 14 text

@nduthoit 14 cache it!? Thursday, 5 September, 13

Slide 15

Slide 15 text

@nduthoit 15 can’t query the cached data :-( @nduthoit Thursday, 5 September, 13

Slide 16

Slide 16 text

1 class Invoice(models.Model): 2 ... 3 total_amount = models.DecimalField(max_digits=12, 4 decimal_places=2) 5 ... 6 7 # total_amount update magic goes here @nduthoit 16 1 qset = Invoice.objects.filter( 2 total_amount__gte=Decimal('2000.00') 3 ) denormalize! @nduthoit Thursday, 5 September, 13

Slide 17

Slide 17 text

@nduthoit 17 2. A different view of your data Thursday, 5 September, 13

Slide 18

Slide 18 text

@nduthoit 18 Accountant Business owner Invoice Payments Bill Income Expense Transfer Thursday, 5 September, 13

Slide 19

Slide 19 text

@nduthoit 19 h#p://www.principlesofaccounJng.com/chapter2/cashgenledger2.png Accountant Thursday, 5 September, 13

Slide 20

Slide 20 text

@nduthoit 20 Accountant Invoice Payments Bill Income Expense Transfer Business owner Journal Entry Debit Credit Thursday, 5 September, 13

Slide 21

Slide 21 text

@nduthoit 21 generate on the fly? 1 class Invoice(...): 2 ... 3 def generate_account_transaction(self): 4 ... 5 # Accounting magic to generate CREDITs and DEBITs 6 ... 7 return account_transactions Thursday, 5 September, 13

Slide 22

Slide 22 text

@nduthoit 22 generate on the fly? • doesn’t  scale • can’t  query  a  small  set  of  the  data  without  re-­‐ calculaJng  everything Thursday, 5 September, 13

Slide 23

Slide 23 text

@nduthoit 23 denormalize! 1 class JournalEntry(...): 2 ... 3 entry_date = models.DateField() 4 description = models.TextField() 5 ... 6 # Link back to invoice, payment, income, ... 7 source = models.CharField(...) 8 source_pk = models.CharField(...) 9 ... 10 11 12 class AccountTransaction(...): 13 ... 14 journal_entry = models.ForeignKey('JournalEntry') 15 account = models.ForeignKey('Account') 16 debit_amount = models.DecimalField(...) 17 credit_amount = models.DecimalField(...) 18 description = models.TextField(...) 19 ... Thursday, 5 September, 13

Slide 24

Slide 24 text

@nduthoit 24 denormalize! 1 class Invoice(...): 2 ... 3 def save(self, *args, **kwargs): 4 super(Invoice, self).save(*args, **kwargs) 5 # Create or update journal entries, debits and credits 6 self.update_bookkeeping() 7 ... Thursday, 5 September, 13

Slide 25

Slide 25 text

@nduthoit 25 3. Combining multiple models Thursday, 5 September, 13

Slide 26

Slide 26 text

@nduthoit 26 Expense Transfer Income Invoice Payment data models Thursday, 5 September, 13

Slide 27

Slide 27 text

@nduthoit 27 combine models on the fly? ... incomes = Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13

Slide 28

Slide 28 text

@nduthoit 28 combine models on the fly? ... incomes = Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13

Slide 29

Slide 29 text

@nduthoit 29 combine models on the fly? ... incomes = Income.objects.filter(**income_filters).sort_by('-date') expenses = Expense.objects.filter(**expense_filters).sort_by('-date') incomes_and_expenses = list(itertools.chain(incomes, expenses)) sort_incomes_and_expense(incomes_and_expense, [‘-date’]) ... Thursday, 5 September, 13

Slide 30

Slide 30 text

@nduthoit 30 @nduthoit Thursday, 5 September, 13

Slide 31

Slide 31 text

@nduthoit 31 Invoice Payment Bill Payment Income Expense Transfer denormalize! Financial Transaction Thursday, 5 September, 13

Slide 32

Slide 32 text

@nduthoit 32 class FinancialTransaction(models.Model): ... payment_account_pk = models.ForeignKey(...) income_expense_account = models.ForeignKey(...) amount = models.DecimalField(...) currency_code = models.CharField(...) transaction_date = models.DateField(...) description = models.CharField(...) ... denormalize! Thursday, 5 September, 13

Slide 33

Slide 33 text

@nduthoit 33 Keeping your data in sync Thursday, 5 September, 13

Slide 34

Slide 34 text

@nduthoit 34 canonical data • define  the  authoritaJve  source(s)  of  data • maintain  your  denormalized  data  in  sync  (one-­‐way) • never  modify  the  denormalized  instances  manually Thursday, 5 September, 13

Slide 35

Slide 35 text

@nduthoit 35 Payments Income Expense Transfer Financial Transaction canonical data Thursday, 5 September, 13

Slide 36

Slide 36 text

@nduthoit 36 canonical data (know thy parent) • have  a  way  to  uniquely  idenJfy  and  get  the  canonical   data  source  (kind  of  like  a  generic  foreign  key): • canonical_source • canonical_pk • canonical_shard_key • makes  it  easy  to: • detect  data  consistency  issues • re-­‐create  as  required Thursday, 5 September, 13

Slide 37

Slide 37 text

@nduthoit 37 class FinancialTransaction(models.Model): ... # Canonical Info business_pk = models.PositiveIntegerField(db_index=True) canonical_pk = models.PositiveIntegerField(db_index=True) canonical_type = models.CharField( max_length=20, choices=FinancialTransactionCanonicalTypes.CHOICES, db_index=True ) ... canonical data (know thy parent) Thursday, 5 September, 13

Slide 38

Slide 38 text

@nduthoit 38 Propagating changes Thursday, 5 September, 13

Slide 39

Slide 39 text

@nduthoit 39 Propagating changes: manually class Income(...): ... def save(self, *args, **kwargs): super(Income, self).save(*args, **kwargs) ft_data = self.get_financial_transaction_data() create_or_update_financial_transaction(ft_data) ... Thursday, 5 September, 13

Slide 40

Slide 40 text

@nduthoit 40 Propagating changes: “use the signals Luke” 1 @receiver(post_save, dispatch_uid="ft_producer_post_save") 2 def handle_ft_producer_post_save(sender, instance, created, 3 **kwargs): 4 if issubclass(sender, FinancialTransactionProducerMixin): 5 ft_data = instance.get_financial_transaction_data() 6 create_or_update_financial_transaction(ft_data) 7 8 ... 9 class FinancialTransactionProducerMixin(models.Model): 10 class Meta: 11 abstract = True 12 ... Thursday, 5 September, 13

Slide 41

Slide 41 text

@nduthoit 41 class Income(FinancialTransactionProducerMixin): ... def get_financial_transaction_data(self): ... class Expense(FinancialTransactionProducerMixin): ... def get_financial_transaction_data(self): ... class InvoicePayment(FinancialTransactionProducerMixin): ... def get_financial_transaction_data(self): ... Propagating changes: “use the signals Luke” Thursday, 5 September, 13

Slide 42

Slide 42 text

@nduthoit 42 Propagating changes: “use the signals Luke” • can’t  use:  qset.update(),  qset.create_bulk(),   qset.delete(),  ... • error  handling  and  transacJon  management  is   tricky Thursday, 5 September, 13

Slide 43

Slide 43 text

@nduthoit 43 Financial Transaction Propagating changes views, browser, user Payment Income Expense Transfer Thursday, 5 September, 13

Slide 44

Slide 44 text

@nduthoit 44 Payment Income Expense Transfer Financial Transaction views, browser, user Propagating changes: backwards? Thursday, 5 September, 13

Slide 45

Slide 45 text

@nduthoit 45 Financial Transaction views, browser, user Propagating changes: backwards? interface Payment Income Expense Transfer Thursday, 5 September, 13

Slide 46

Slide 46 text

@nduthoit 46 Dealing with concurrency Thursday, 5 September, 13

Slide 47

Slide 47 text

@nduthoit 47 change  price   to  $10 update bookkeeing change  price   to  $10.5 update bookkeeing concurrency issues Thursday, 5 September, 13

Slide 48

Slide 48 text

@nduthoit 48 change  price   to  $10 change  price   to  $10.5 concurrency issues update bookkeeping update bookkeeping Thursday, 5 September, 13

Slide 49

Slide 49 text

@nduthoit 49 one at a time mutexes,  locks from retools.lock import Lock with Lock('a_key'): # do something that should only be done one at a time Thursday, 5 September, 13

Slide 50

Slide 50 text

@nduthoit 50 1 class JournalEntry(...): 2 ... 3 entry_date = models.DateField() 4 description = models.TextField() 5 ... 6 # Link back to invoice, payment, income, ... 7 source = models.CharField(...) 8 source_pk = models.CharField(...) 9 ... 10 11 12 class AccountTransaction(...): 13 ... 14 journal_entry = models.ForeignKey('JournalEntry') 15 account = models.ForeignKey('Account') 16 debit_amount = models.DecimalField(...) 17 credit_amount = models.DecimalField(...) 18 description = models.TextField(...) 19 ... one at a time Thursday, 5 September, 13

Slide 51

Slide 51 text

@nduthoit 51 one at a time pylock h#ps://github.com/waveaccounJng/pylock h#ps://pypi.python.org/pypi/pylock 1 def delete_journal_entry(cls, journal_entry): 2 ... 3 with JournalEntryLock(journal_entry.source, 4 journal_entry.source_pk): 5 # remove existing AccountTransactions 6 AccountTransaction.objects.using(shard).filter( 7 journal_entry=journal_entry 8 ).delete() 9 journal_entry.delete() 10 ... Thursday, 5 September, 13

Slide 52

Slide 52 text

@nduthoit 52 measure! •you  can’t  improve  what  you  can’t  measure •recipe: 1. detect/measure 2. fix  one  thing 3. repeat Thursday, 5 September, 13

Slide 53

Slide 53 text

@nduthoit 53 1 class NonLockingRedisLock(RedisLock): 2 def __init__(self, key, expires, timeout, client): 3 self.locked = False 4 timeout = 0 # no timeout 5 super(NonLockingRedisLock, self).__init__(key, expires, 6 timeout, client) 7 8 def acquire(self): 9 try: 10 super(NonLockingRedisLock, self).acquire() 11 self.locked = True 12 except LockTimeout: 13 self.locked = False 14 statsd.incr('interfaces.lock.timeout') 15 logger.warning("LockTimeout", exc_info=full_exc_info()) 16 except RedisError: 17 self.enabled = False 18 logger.warning("Lock.acquire RedisError", 19 exc_info=full_exc_info()) 20 statsd.incr('interfaces.lock.RedisError') 21 ... measure! pylock Thursday, 5 September, 13

Slide 54

Slide 54 text

@nduthoit 54 measure! Thursday, 5 September, 13

Slide 55

Slide 55 text

@nduthoit 55 change  price   to  $10 update bookkeeing change  price   to  $10.5 update bookkeeing concurrency issues ordering Thursday, 5 September, 13

Slide 56

Slide 56 text

@nduthoit 56 special thanks • Zuzel  Vera • Michael  WarkenJn • Tommy  Guy • Ash  Christopher • Mike  Ivanov Thursday, 5 September, 13

Slide 57

Slide 57 text

@nduthoit Thank you : ) Questions? nduthoit.com github.com/nduthoit [email protected] h1ps://speakerdeck.com/nduthoit/ denormalize-­‐all-­‐the-­‐things Thursday, 5 September, 13