Berlin 2013 - Session - Reza Spagnolo

Adap%ve Applica%on Architecture Reza Spagnolo @rmspagnolo

Hey there ! Who am I ? • 
A student •  An engineer, for 9 years now •  Interested in building systems •  Dev & Ops since the beginning

#monitoringsocks but never sucked for real

Monitoring is an architecture component

Infrastructure is code

Monitoring is code •  Development process •  Tes%ng
•  Deployment

Monitoring is service •  Metrics •  Alerts

Namespaces There are only two hard things in Computer
Science: cache invalida<on and naming things. -‐-‐ Phil Karlton

#soLwaresucks without namespaces

Metrics namespaces •  Helps your mental model • 
Helps iden%fying things •  Dimensions: loca%on, versions, etc

Monitoring based promo%on Acceptance Development Produc%on
•  Produc%on conﬁgura%on •  Comparison •  Log analysis

Monitoring deployment •  Push changes •  Keep correspondence
•  Automate •  Namespaces

Synthe%c traﬃc

Canaries

Miner’s canary •  If a customer lets you know
about a problem then you have already failed at least twice •  The right quan%ty •  Filtering – see the right picture •  Document changes to your baselines

Other types of birds

The preXy ones we just saw

The Angry ones

And monkeys !

Audi%ng Events %meline •  Changes •  Deployments
•  Rollbacks •  Alarms

Architecture •  Single responsibility principle •  Orchestra%on or
Choreography •  Dynamic conﬁgura%on •  Failover and feedback cycles •  Rate limi%ng •  Integra%on paXerns

Single responsibility principle •  (Micro-‐)Services •  Components
•  Small number of dependencies •  Predictable failure modes •  Easier adapta%on •  Expecta%on on metrics

Orchestra%on or Choreography •  Orchestra%on – May be simpler
to reason about – Coupling with the director •  Choreography – Possibly more ﬂexible – Beware of corrup%on of state

Dynamic configura%on •  Reconfigurable at run%me •  Fast
reac%on •  Beware of snowflakes

Failover and feedback cycles •  Automated failover • 
Failover stress •  Beware of amplifying eﬀects •  Break cycles

Rate limi%ng •  Degraded is beXer than nothing
•  Not only at the top level •  Component rate limi%ng •  Rate limi%ng should be dynamic •  Rate limi%ng can be par%%oned •  Clients should be part of the contract •  Rate limi%ng is aLer all handshaking •  Handshaking: within the protocol or out of band

Integra%on and component PaXerns •  Timeouts •  Circuit
breakers •  Resource pools •  Fail fast •  Queue and retry •  Applica%on pings and sanity checks

Addi%onal prac%ces •  Quaran%ne •  Regenera%ve infrastructure
•  Rollback and monitoring •  Automa%on of SOP – Runbook

Automated runbooks and checklists •  Automate your SOP
•  Respond to failure with a checklist •  Automate checklists too •  Helps to avoid the cogni%ve bias and other nasty stuﬀ your brain does

Discipline !

Sources •  Recovery Oriented Compu%ng Papers •  James
Hamilton LISA paper •  Release It ! •  Scalable Internet Architectures •  A ton of other great books and papers

The value Among the kinds of overhead: • 
The opera%onal one •  The customers one No maXer how sophis%cated is our monitoring infrastructure issues no%ﬁed by customers are at the end the most important ones as they impact their experience directly and are oLen discovering unknown bugs. Freeing up the team as much as possible from the overhead of the ﬁrst type gives more %me to focus on the issues of the product itself.

Thank you !

Berlin 2013 - Session - Reza Spagnolo

Berlin 2013 - Session - Reza Spagnolo

Monitorama

More Decks by Monitorama

Featured

Transcript

Adap%ve Applica%on Architecture Reza Spagnolo @rmspagnolo

Hey there ! Who am I ? •

#monitoringsocks but never sucked for real

Monitoring is an architecture component

Infrastructure is code

Monitoring is code •  Development process •  Tes%ng

Monitoring is service •  Metrics •  Alerts

Namespaces There are only two hard things in Computer

#soLwaresucks without namespaces

Metrics namespaces •  Helps your mental model •

Monitoring based promo%on Acceptance Development Produc%on

Monitoring deployment •  Push changes •  Keep correspondence

Synthe%c traﬃc

Canaries

Miner’s canary •  If a customer lets you know

Other types of birds

The preXy ones we just saw

The Angry ones

And monkeys !

Audi%ng Events %meline •  Changes •  Deployments

Architecture •  Single responsibility principle •  Orchestra%on or

Single responsibility principle •  (Micro-‐)Services •  Components

Orchestra%on or Choreography •  Orchestra%on – May be simpler

Dynamic conﬁgura%on •  Reconﬁgurable at run%me •  Fast

Failover and feedback cycles •  Automated failover •

Rate limi%ng •  Degraded is beXer than nothing

Integra%on and component PaXerns •  Timeouts •  Circuit

Addi%onal prac%ces •  Quaran%ne •  Regenera%ve infrastructure

Automated runbooks and checklists •  Automate your SOP

Discipline !

Sources •  Recovery Oriented Compu%ng Papers •  James

The value Among the kinds of overhead: •

Thank you !