Release It!
Design and Deploy
Production-Ready Software
Jake Trent
18 Aug 2009
by Michael T. Nygard
Slide 2
Slide 2 text
The Book
Is thoughtful reading
●Prompt evaluation:
○Self
○Project
○Company
●Pragprog.com
Slide 3
Slide 3 text
Motivation
For quality, production-ready software
●Go home at night w/o calls to your cell phone
●Avoid un-needed cost:
○Down time costs
○Opportunity costs
○Operational costs
○Legal costs
●Respectable work
Slide 4
Slide 4 text
The Need
For periodic reminder of these issues
●Code that passes QA:
○Can still fail miserably
○Can still give users bad impressions
○Can still inflict avoidable costs
●Problems will happen
Slide 5
Slide 5 text
Stability
Antipatterns
Slide 6
Slide 6 text
Integration Points
Antipattern
●Modern display of coupling: Systems talking to
systems
●Every contact point is a possible failure point
●Things not under your control
○Network reliability
○Data availability
○External system correctness
Slide 7
Slide 7 text
Chain Reactions
Antipattern
●One failure often triggers another
●Resource availability often the catalyst
●Turn into the system attacking itself
Slide 8
Slide 8 text
Cascading Failures
Antipattern
●Failure in one layer causes problems in callers
●Insufficiently paranoid integration points
Slide 9
Slide 9 text
Users
Antipattern
●"Users of a system have this knack for creative
destruction."
●Each user consumes more memory
●Some are a burden, others plain malicious
Slide 10
Slide 10 text
Blocked Threads
Antipattern
●"Adding complexity to solve on problem creates the
risk of entirely new failure modes."
●Resource pool contention
●Beware 3rd party API
●Timeouts
Slide 11
Slide 11 text
Attacks of Self-Denial
Antipattern
●Plan for your own success
●"Good marketing can kill you at any time."
Slide 12
Slide 12 text
Scaling Effects
Antipattern
●Horizontal scale communication
●Shared resource bottleneck
Slide 13
Slide 13 text
Unbalanced Capacities
Antipattern
●Performance will depend on your most constrained
resource
●Not often discovered by QA
●Consider proportions of types of transactions
Slide 14
Slide 14 text
Slow Responses
Antipattern
●Better to fail fast than to hog resources only to
eventually fail
Slide 15
Slide 15 text
SLA Inversion
Antipattern
●"When calling third parties, service levels only
decrease."
●Consider real need and real cost
●Service level can only be as high as the lowest
subsystem
Slide 16
Slide 16 text
Unbounded Result Sets
Antipattern
●Test uses unrealistically small data sets
●Use limits on all queries
Slide 17
Slide 17 text
Stability
Patterns
Slide 18
Slide 18 text
Use Timeouts
Pattern
●Prevent integration points from becoming blocked
threads
●Retry for potential transient timeouts
●Ability to move on without return (fail fast)
Slide 19
Slide 19 text
Circuit Breaker
Pattern
●Prevent operations rather than re-execute them
●Note each failure until switch is flipped
●Use with timeouts - try again eventually
●Visible to operations
Slide 20
Slide 20 text
Bulkheads
Pattern
●Find natural partitions
○Thread groups
○Resource pools
○Hardware
Slide 21
Slide 21 text
Steady State
Pattern
●System should run w/o manual intervention
●Human fiddling leads to error
●Purge data
●Roll logs
●At least move out of production environment
Slide 22
Slide 22 text
Fail Fast
Pattern
●Check availability before attempted use
●Basic parameter checking before loading expensive
objects
●"Don't do useless work"
Slide 23
Slide 23 text
Handshaking
Pattern
●Allow integration points to throttle themselves
Slide 24
Slide 24 text
Test Harness
Pattern
●Box independent of the "norms" of the environment
●As devious as possible, esp at network level
●Out-of-spec
●Stress
Slide 25
Slide 25 text
Decoupling Middleware
Pattern
●Decide on the plumbing at the "last responsible
moment"
●Hardest to change later
Slide 26
Slide 26 text
"Paranoia is just good thinking."
"It's unlikely that anyone will notice
your system's lack of downtime."
Michal T. Nygard
Slide 27
Slide 27 text
Das Ende
aprilandjake.com/content/release-it-stability-review/