Slide 1

Slide 1 text

Adap%ve  Applica%on   Architecture   Reza  Spagnolo   @rmspagnolo  

Slide 2

Slide 2 text

Hey  there  !   Who  am  I  ?   •  A  student   •  An  engineer,  for  9  years  now   •  Interested  in  building  systems   •  Dev  &  Ops  since  the  beginning  

Slide 3

Slide 3 text

#monitoringsocks   but  never  sucked  for  real  

Slide 4

Slide 4 text

Monitoring  is  an  architecture   component  

Slide 5

Slide 5 text

Infrastructure  is  code  

Slide 6

Slide 6 text

Monitoring  is  code   •  Development  process   •  Tes%ng   •  Deployment  

Slide 7

Slide 7 text

Monitoring  is  service   •  Metrics   •  Alerts  

Slide 8

Slide 8 text

Namespaces   There  are  only  two  hard  things  in  Computer   Science:  cache  invalida

Slide 9

Slide 9 text

#soLwaresucks   without  namespaces  

Slide 10

Slide 10 text

Metrics  namespaces   •  Helps  your  mental  model   •  Helps  iden%fying  things   •  Dimensions:  loca%on,  versions,  etc  

Slide 11

Slide 11 text

Monitoring  based  promo%on   Acceptance   Development   Produc%on   •  Produc%on  configura%on   •  Comparison   •  Log  analysis  

Slide 12

Slide 12 text

Monitoring  deployment   •  Push  changes   •  Keep  correspondence   •  Automate   •  Namespaces  

Slide 13

Slide 13 text

Synthe%c  traffic  

Slide 14

Slide 14 text

Canaries  

Slide 15

Slide 15 text

Miner’s  canary   •  If  a  customer  lets  you  know  about  a  problem   then  you  have  already  failed  at  least  twice   •  The  right  quan%ty     •  Filtering  –  see  the  right  picture   •  Document  changes  to  your  baselines  

Slide 16

Slide 16 text

Other  types  of  birds  

Slide 17

Slide 17 text

The  preXy  ones  we  just  saw  

Slide 18

Slide 18 text

The  Angry  ones  

Slide 19

Slide 19 text

And  monkeys  !  

Slide 20

Slide 20 text

Audi%ng   Events  %meline   •  Changes   •  Deployments   •  Rollbacks   •  Alarms  

Slide 21

Slide 21 text

Architecture   •  Single  responsibility  principle   •  Orchestra%on  or  Choreography   •  Dynamic  configura%on   •  Failover  and  feedback  cycles   •  Rate  limi%ng   •  Integra%on  paXerns  

Slide 22

Slide 22 text

Single  responsibility  principle   •  (Micro-­‐)Services   •  Components   •  Small  number  of  dependencies   •  Predictable  failure  modes   •  Easier  adapta%on   •  Expecta%on  on  metrics  

Slide 23

Slide 23 text

Orchestra%on  or  Choreography   •  Orchestra%on   – May  be  simpler  to  reason  about   – Coupling  with  the  director   •  Choreography   – Possibly  more  flexible   – Beware  of  corrup%on  of  state  

Slide 24

Slide 24 text

Dynamic  configura%on   •  Reconfigurable  at  run%me   •  Fast  reac%on   •  Beware  of  snowflakes  

Slide 25

Slide 25 text

Failover  and  feedback  cycles   •  Automated  failover   •  Failover  stress   •  Beware  of  amplifying  effects   •  Break  cycles  

Slide 26

Slide 26 text

Rate  limi%ng   •  Degraded  is  beXer  than  nothing   •  Not  only  at  the  top  level   •  Component  rate  limi%ng   •  Rate  limi%ng  should  be  dynamic   •  Rate  limi%ng  can  be  par%%oned   •  Clients  should  be  part  of  the  contract   •  Rate  limi%ng  is  aLer  all  handshaking   •  Handshaking:  within  the  protocol  or  out  of  band  

Slide 27

Slide 27 text

Integra%on  and  component  PaXerns   •  Timeouts   •  Circuit  breakers   •  Resource  pools   •  Fail  fast   •  Queue  and  retry   •  Applica%on  pings  and  sanity  checks  

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Addi%onal  prac%ces   •  Quaran%ne   •  Regenera%ve  infrastructure   •  Rollback  and  monitoring   •  Automa%on  of  SOP  –  Runbook  

Slide 30

Slide 30 text

Automated  runbooks  and  checklists   •  Automate  your  SOP   •  Respond  to  failure  with  a   checklist   •  Automate  checklists  too   •  Helps  to  avoid  the   cogni%ve  bias  and  other   nasty  stuff  your  brain   does  

Slide 31

Slide 31 text

Discipline  !  

Slide 32

Slide 32 text

Sources   •  Recovery  Oriented  Compu%ng  Papers   •  James  Hamilton  LISA  paper   •  Release  It  !   •  Scalable  Internet  Architectures   •  A  ton  of  other  great  books  and  papers  

Slide 33

Slide 33 text

The  value   Among  the  kinds  of  overhead:   •  The  opera%onal  one     •  The  customers  one   No  maXer  how  sophis%cated  is  our  monitoring  infrastructure  issues   no%fied  by  customers  are  at  the  end  the  most  important  ones  as  they   impact  their  experience  directly  and  are  oLen  discovering  unknown   bugs.     Freeing  up  the  team  as  much  as  possible  from  the  overhead  of  the   first  type  gives  more  %me  to  focus  on  the  issues  of  the  product  itself.  

Slide 34

Slide 34 text

Thank  you  !