Think about design
choices in terms of
trade-offs
@Randommood
Slide 10
Slide 10 text
chosen trade-offs
make the foundation
of your system
@Randommood
Slide 11
Slide 11 text
Common
Mistakes
Slide 12
Slide 12 text
De-prioritizing Testing
Cutting corners on testing
carries a hidden cost
Test the full system: client,
code, & provisioning code
Code reviews != tests.
Have both
Continuous Integration
(CI) is critical to velocity,
quality, & transparency
Slide 13
Slide 13 text
De-prioritizing Releases
Release stability is tied to system
stability
Iron out your deploy process!
Dependencies on other systems
make this even more important
Canary testing, dark launches,
feature flags, etc are good
@Randommood
Slide 14
Slide 14 text
Automation shortcuts taken while
in a rush will come back to haunt
you
Runbooks are a must have
Localhost is the devil
Sloppy operational work is the
mother of all evils
De-prioritizing Ops
@Randommood
Slide 15
Slide 15 text
“Future you monitoring” is
bad, make it part of MVP
Alert fatigue has a high
cost, don’t let it get that far
Link alerts to runbooks
Routinely test your
escalation paths
De-prioritizing Insight
✨
@Randommood
Slide 16
Slide 16 text
The inner workings of data
components matter.
Learn about them
System boundaries ought
to be made explicit
Deprecate your go-to
person
De-prioritizing Knowledge
@Randommood
Slide 17
Slide 17 text
The internet is an awful
place
Expect DoS/DDoS
Think about your system, its
connections, and their
dependencies
Having the ability to turn off
features/clients helps
De-prioritizing Security
Slide 18
Slide 18 text
Service ownership implies
leveling-up operationally
Architectural choices
made in a rush can have
a long shelf life
Don’t sacrifice tests.
Test the FULL system
What we learned
✨
@Randommood
Slide 19
Slide 19 text
What
Matters
@Randommood
Slide 20
Slide 20 text
Mind system Design
Simple & utilitarian design
takes you a long way
Use well understood
components
NIH is a double edged
sword
Use feature flags & on/off
switches (test them!)
@Randommood
Slide 21
Slide 21 text
Meet
Alice
Slide 22
Slide 22 text
Alice’s Testing Areas
Correctness Error Performance Robustness
Good output
from good
inputs
Reasonable
reaction to
incorrect
input
Time to Task (TTT) for Behavior after
Goal
Single node
Multi node
Clustered
Cache enabled
Given # of
input/outputs
Given uptime
@Randommood
Slide 23
Slide 23 text
a Testing Harness
Is a fantastic thing to have
Invest in QA automation
engineers
Adding support for
regressions & domain-
specific testing pays off
@Randommood
Slide 24
Slide 24 text
Mind system Limits
Rate limit your API calls
especially if they are public or
expensive to run
Instrument / add metrics to
track them
Rank your services & data
(what can you drop?)
Capacity analysis is not dead
✨
Slide 25
Slide 25 text
Mind system Growth
Watch out for initial over-
architecting
“The application that takes
you to 100k users is not the
same one that takes you to
1M, and so on…” @netik
Keep changes small!
@Randommood
Slide 26
Slide 26 text
Mind system Configs
System assumptions are
dangerous, make them
explicit
Standardize system
configuration (data bags,
config file, etc)
Hardcoding is the devil
Slide 27
Slide 27 text
Mind Resources
Redundancies (of resources,
execution paths, checks,
data, messages, etc) build
resilience
Mechanisms to guard system
resources are good to have
Your system is also tied to the
resources of its dependencies
Slide 28
Slide 28 text
Distrust is healthy
Distrust client behavior, even
if they are internal
Decisions have an expiration
date. Periodically re-
evaluate them as past
you was much dumber
A revisionist culture produces
better systems
✨
@Randommood
Slide 29
Slide 29 text
Keep track of your
technical debt &
repay it regularly
It’s about lowering
the risk of change
with tools & culture
Mind assumptions
What we learned
✨
@Randommood
Slide 30
Slide 30 text
TL;DR
Things that are easy to neglect may
be harder to correct later
Think in terms of tradeoffs
TESTING MATTERS!
Not all process is evil
Keep in Mind