Development is just the tip of the iceberg

André Arko @indirect    

DANGER PRODUCTION AHEAD

Metrics

Metrics are important

Metrics tell you what is happening

Metrics convince you you understand

Averages convince you you understand

but brains are pretty weird

you probably don’t understand averages

Average (right?)

Averages mask problems

Averages !

Instead graph the full distribution

Instead graph median, mean, and 95th

Aggregates another kind of average

Srsly guise breakout graphs

Srsly guise alert on broken-‐ out metrics

Srsly guise alerts on aggregates are probably too late

Servers

Servers you have no idea what is going on

really.

it’s 3am. do you know where your application is?

Routing your app has this

Routing how slow is it?

Routing does it back up?

Request time

Request time not your metrics, I mean for real

Request time make requests from all over

Request time graph them

Request time graph them alert on them

Request time graph them alert on them thank me later

Runtime lag

Runtime lag (how do you tell you lost consciousness?)

Runtime lag do you have it?

Runtime lag do you have it? (yes)

Runtime lag how bad is it?

Runtime lag how do you track it?

VM lag

VM lag do you have it?

VM lag do you even check for it?

VM lag do you know how to check for it?

Data stores

Data stores in production

Data stores in production are distributed

what does that mean?

your experience (so far) is wrong

Saving data

Saving data tries to save your data

Saving data might save your data

Replication

Replication doesn’t save you

Postgres async replication

Postgres network failures can lose saved data

Redis has no failover

Redis-‐sentinel elects a new leader

Redis-‐sentinel keeps one leader’s saves during failures

Mongo returns before the ﬁrst write

Mongo your data is on zero disks (so far)

Mongo demand N copies survive N-‐1 failures

trust no one

if you didn’t try it, you are guessing

try it yourself

So what did we learn?

Production is fundamentally

Production is fundamentally systemically

Production is fundamentally systemically diﬀerent

Failures will happen

Failures can be resisted

Failures should not result in one-‐oﬀ patches

Survival requires systematic deliberation & design

Survival requires systematic trials & testing

production is not development

don’t you forget it !

Development is just the tip of the iceberg

Development is just the tip of the iceberg

More Decks by André Arko

Other Decks in Technology

Featured

Transcript