Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maslow's Hierarchy of Database Needs

Maslow's Hierarchy of Database Needs

It’s easy to say, you should always automate everything, monitor everything, and instrument every inch of your data infra. But overengineering in advance of your needs can be just as costly as the reverse, particularly for startups. Engineering cycles are scarce. How do you decide where to spend them?

Enter Maslow’s hierarchy of needs—for databases.

For humans, Maslow’s hierarchy of needs is a pyramid of desires that must be satisfied for us to flourish: survival, safety, love and belonging, esteem, and self-actualization. Each level depends on the preceding ones—we need survival before safety, safety before love and belonging, etc.

Really, databases aren’t so different from you and me. They need:

Survival: Do you even have a data store? How should you decide what storage systems to run? Is it up? Is it alive? Do you have backups? Are they valid?

Safety: Are there multiple live copies? Are they geographically distributed? What is your failover story? Are your humans redundant too?

Love and belonging: Are your databases first-class citizens of your engineering processes, and do they share config management and tooling with the rest of the org? Are schema changes defined in code and revertable? Have you eliminated special snowflakes?

Esteem: Can your observability stack surface problems before they impact production? Can you correlate events across the stack? Can you automatically remediate common failures without human intervention? How do your observability requirements evolve as your org matures?

Self-actualization: Is your data store its best possible self, and what does that even mean for your org? (The self-actualized, maturely instrumented storage layer for a website with 1 billion users will look very different from the self-actualized storage layer for a young and highly volatile startup environment.) How can you assess your appetite for risk, your stage of development, and the layers of process you should invest in organizationally at each stage?

Charity Majors

June 23, 2016
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. Maslow’s Hierarchy of Needs
    For Databases
    Charity Majors, CTO honeycomb.io
    @mipsytipsy

    View Slide

  2. Maslow’s Hierarchy of Needs
    For Databases
    Charity Majors, CTO honeycomb.io
    @mipsytipsy

    View Slide

  3. Dr. Abraham Maslow, Psychologist
    real talk about
    storage systems
    … for non-DBAs

    View Slide

  4. “Database Reliability
    Engineering”
    with Laine Campbell

    View Slide

  5. Dr DevOps is in the house
    breakin’ down yr silos

    View Slide

  6. Maslow’s hierarchy of needs

    View Slide

  7. Databases:
    They’re Just Like Should Be Just Like Other Services!

    View Slide

  8. “self-actualized database”
    • Empowering for developers
    • Resilient to common failures
    • Friendly to operations (debuggable,
    understandable)

    View Slide

  9. The “right database”
    is the one that empowers you
    to achieve your mission.

    View Slide

  10. Survival

    View Slide

  11. Selecting a storage system
    • Choose boring technology, when you can
    • Reuse solutions. Resist software sprawl
    • Spend innovation tokens only on key differentiators
    h/t @mcfunley

    View Slide

  12. • If your company is very young, optimize
    for velocity and developer productivity.
    • The more mature your company is, the
    more operational impact over the long
    term trumps all.

    View Slide

  13. Safety

    View Slide

  14. Backups
    • How much data can you afford to lose?
    • Monitor backup freshness
    • Archive remotely. Test your restore process.

    View Slide

  15. If you are not regularly testing restores,
    you have Schrodinger’s backups.

    View Slide

  16. Replication
    • Multiple live copies
    • Consider write concern, replication
    factor, quorum, votes, dedicated
    backup nodes …
    • Distribute across AZs, regions, or DCs

    View Slide

  17. Failover
    • What happens when any node dies?
    • Practice and document under
    controlled circumstances
    • Human SPOFs are just as bad as
    machine SPOFs

    View Slide

  18. Ponder your path to horizontal scalability
    Forecasting more than 10x growth is mostly impossible
    Just try to not screw over your future self.

    View Slide

  19. Love & Belonging

    View Slide

  20. “Show me the
    DevOps!”
    Making your data a first-class
    citizen of both dev and ops
    processes

    View Slide

  21. Dev, Ops, DBA
    • Use the same code review and deploy processes
    • Use the same infrastructure provisioning and
    config management
    • Data-land should feel familiar and intuitive, not
    alienating or “special”

    View Slide

  22. Embrace failure

    View Slide

  23. Guard Rails
    • Backups, restores. Verified.
    • Unit tests
    • Boring failovers
    • Shared on call rotations
    • Network isolation between prod &
    other
    • Don’t get pissy when people mess
    up, help them fix it
    • Don’t swoop in and do it all yourself

    View Slide

  24. Right-sizing your automation
    • Indexing
    • Scaling up or down
    • Failover
    • Killing bad queries

    View Slide

  25. Esteem

    View Slide

  26. Your services need:
    • Observability
    • Instrumentation
    • Graceful degradation

    View Slide

  27. Your humans need:
    • Tools for debugging
    • Checklists, documentation
    • Tractable levels of graphs and alerting

    View Slide

  28. Lifecycle of instrumentation
    Is it up?
    Is it slow?
    Canned graphs for system and db metrics
    Hand crafted dashboards
    Lots of outages
    Collection of “heroic” debugging commands
    Automated query profiling / analysis
    Auto-remediation
    Unique request ids, full-stack tracing
    Realtime exploratory tools
    Predictive analysis / precognition
    skip or change

    shrink

    if we’re at 20 min or less, include this slide

    use this for silvia’s talk?

    View Slide

  29. Golden rules for alerting
    • Emphasize end-to-end checks on key indicators
    • Health of the service, not individual nodes
    • Page only if actionable
    • Track auto-remediation events
    • Shared dev/ops rotation by service owners

    View Slide

  30. View Slide

  31. Self-Actualization

    View Slide

  32. basically …
    (credit: @zakgreant)

    View Slide

  33. Self-actualized database
    • Your data is safe and recoverable
    • Resilient to common errors
    • Understandable, debuggable
    • Shares processes and tooling with the rest of the stack
    • Empowers you to achieve your mission.

    View Slide

  34. in conclusion
    Don’t be scared. Mind your damn backups. Everything else will probably be ok.

    View Slide