Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maslow's Hierarchy of Database Needs

Maslow's Hierarchy of Database Needs

It’s easy to say, you should always automate everything, monitor everything, and instrument every inch of your data infra. But overengineering in advance of your needs can be just as costly as the reverse, particularly for startups. Engineering cycles are scarce. How do you decide where to spend them?

Enter Maslow’s hierarchy of needs—for databases.

For humans, Maslow’s hierarchy of needs is a pyramid of desires that must be satisfied for us to flourish: survival, safety, love and belonging, esteem, and self-actualization. Each level depends on the preceding ones—we need survival before safety, safety before love and belonging, etc.

Really, databases aren’t so different from you and me. They need:

Survival: Do you even have a data store? How should you decide what storage systems to run? Is it up? Is it alive? Do you have backups? Are they valid?

Safety: Are there multiple live copies? Are they geographically distributed? What is your failover story? Are your humans redundant too?

Love and belonging: Are your databases first-class citizens of your engineering processes, and do they share config management and tooling with the rest of the org? Are schema changes defined in code and revertable? Have you eliminated special snowflakes?

Esteem: Can your observability stack surface problems before they impact production? Can you correlate events across the stack? Can you automatically remediate common failures without human intervention? How do your observability requirements evolve as your org matures?

Self-actualization: Is your data store its best possible self, and what does that even mean for your org? (The self-actualized, maturely instrumented storage layer for a website with 1 billion users will look very different from the self-actualized storage layer for a young and highly volatile startup environment.) How can you assess your appetite for risk, your stage of development, and the layers of process you should invest in organizationally at each stage?

Charity Majors

June 23, 2016
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. “self-actualized database” • Empowering for developers • Resilient to common

    failures • Friendly to operations (debuggable, understandable)
  2. Selecting a storage system • Choose boring technology, when you

    can • Reuse solutions. Resist software sprawl • Spend innovation tokens only on key differentiators h/t @mcfunley
  3. • If your company is very young, optimize for velocity

    and developer productivity. • The more mature your company is, the more operational impact over the long term trumps all.
  4. Backups • How much data can you afford to lose?

    • Monitor backup freshness • Archive remotely. Test your restore process.
  5. Replication • Multiple live copies • Consider write concern, replication

    factor, quorum, votes, dedicated backup nodes … • Distribute across AZs, regions, or DCs
  6. Failover • What happens when any node dies? • Practice

    and document under controlled circumstances • Human SPOFs are just as bad as machine SPOFs
  7. Ponder your path to horizontal scalability Forecasting more than 10x

    growth is mostly impossible Just try to not screw over your future self.
  8. Dev, Ops, DBA • Use the same code review and

    deploy processes • Use the same infrastructure provisioning and config management • Data-land should feel familiar and intuitive, not alienating or “special”
  9. Guard Rails • Backups, restores. Verified. • Unit tests •

    Boring failovers • Shared on call rotations • Network isolation between prod & other • Don’t get pissy when people mess up, help them fix it • Don’t swoop in and do it all yourself
  10. Lifecycle of instrumentation Is it up? Is it slow? Canned

    graphs for system and db metrics Hand crafted dashboards Lots of outages Collection of “heroic” debugging commands Automated query profiling / analysis Auto-remediation Unique request ids, full-stack tracing Realtime exploratory tools Predictive analysis / precognition skip or change shrink if we’re at 20 min or less, include this slide use this for silvia’s talk?
  11. Golden rules for alerting • Emphasize end-to-end checks on key

    indicators • Health of the service, not individual nodes • Page only if actionable • Track auto-remediation events • Shared dev/ops rotation by service owners
  12. Self-actualized database • Your data is safe and recoverable •

    Resilient to common errors • Understandable, debuggable • Shares processes and tooling with the rest of the stack • Empowers you to achieve your mission.