Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Velocity San Jose 2017 - Database Reliability E...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Velocity San Jose 2017 - Database Reliability Engineering

Laine Campbell and Charity Majors

Avatar for Laine Campbell

Laine Campbell

June 21, 2017
Tweet

More Decks by Laine Campbell

Other Decks in Technology

Transcript

  1. database reliability engineering what. why. how. O’Reilly Velocity Santa Clara,

    2017 Laine Campbell, Sr. Dir, SRE, Fastly Charity Majors, Founder/CEO honeycomb.io @mipsytipsy @lainevcampbell
  2. database reliability engineering 5 develop reliable and resilient datastores and

    shared components for provisioning Provide patterns and knowledge to support other team’s processes to facilitate their work understand and teach data access and storage nuances to ensure all service level objectives can be met anchor teams with expertise for troubleshooting, recovery and other tasks requiring depth, not breadth
  3. why DBRE? 6 reliability is only a smidgen of the

    operational mandate reliability isn’t always required in operations the closer to the data, the more reliability is necessary data requires paranoia, not chaos - @jessietron - https://blog.codeship.com/growing-tech-stack-say-no/
  4. the path to DBRE 7 Who's your DBA? If you

    don’t know the answer, it’s probably you. the DBA, continuing to evolve their skillset the ops engineer, taking data ownership software engineers developing data driven applications
  5. polyglot persistence 9 relational is not the end of the

    line new and shiny brings risk One engine is less likely to suit all cases we vet, find edge cases, learn patterns Integration between engines is crucial The data must flow
  6. virtualization and cloud 10 commodity hardware for DBs forces designing

    for resilience new storage paradigms the power of rapid provisioning DBaaS availability reduces toil
  7. infrastructure as code 11 forces componentization no special snowflakes requires

    coding chops changes become deployments Infrastructure can be versioned
  8. continuous delivery 12 we must be teachers, not gatekeepers testing

    and compliance become top priorities we build patterns to replace reviews build guardrails and tools
  9. dbre guiding principles 14 • protect the data • self-service

    for scale • elimination of toil • databases are not special snowflakes • reduce the barriers between software and operations
  10. protect the data 15 • responsibility of data protection shared

    by cross-functional teams. • standardization and automation with simplicity over expensive/complicated infrastructures. • durability and integrity baked into every part of the architecture and software development lifecycle
  11. self service for scale 16 • metrics knowledge base and

    automated discovery/collection • backup/recovery utilities and APIs auto-deployed for new builds • reference architectures and configs for data stores deployment. • security standards for data store deployments. • safe deployment patterns and tests for database changesets
  12. dbs are not special snowflakes 17 • moving databases into

    the “cattle, not pets” paradigm a la Bill Baker • databases are the last holdouts of commoditization • keep it simple stupid
  13. 18 reduce barriers between teams • Use the same code

    review and deploy processes • Use the same provisioning and config management • Data-land should feel familiar and intuitive, not alienating or “special”
  14. 20 choose boring tech if possible testing consistency guarantees benchmarking

    performance building gold configurations testing operability datastore decision making
  15. 21 Spend innovation tokens only on key differentiators If your

    company is very young, optimize for velocity and developer productivity. The more mature your company is, the more operational impact over the long term trumps all.
  16. recoverability 22 build an effective tiered backup solutions - the

    right backup for the right dataset work with SWE to build data validation pipelines integrate recovery into daily activities build recovery testing automation
  17. 23

  18. data distribution 24 determine effective number of copies distribute broadly

    based on requirements replication, partitioning and sharding
  19. failover and availability 25 understand availability strategies for each datastore

    Evaluate limitations and consequences of failover (cold caches, data loss etc…) practice and document failover in controlled circumstances continued training of all staff on the process to get it in “muscle memory”
  20. scaling paths 26 plan for and test up to 10x

    growth keep an eye for constraints that will impact the future (aka don’t screw your future self)
  21. build guard rails 27 • backups, restores. verified. • unit

    tests • migration and fallback testing • DDL migration patterns • migration heuristic analysis • boring failovers • shared on call rotations • network isolation between prod & other • don’t get pissy when people mess up, help them fix it • don’t swoop in and do it all yourself
  22. automation 28 • rolling DDL • scaling up or down

    • failovers and traffic shifts • detecting or killing bad queries
  23. build for people 30 • tools for debugging • checklists,

    documentation • tractable levels of graphs and alerting
  24. release management 31 • develop patterns for migrations • heuristics

    to surface risky changes • migration and fallback testing • tiered dataset sizes for testing (development, integration, full) • continued developer training and education
  25. 32

  26. 33 self-actualized databases • your data is safe and recoverable

    • resilient to common errors • understandable, debuggable • shares processes and tooling with the rest of the stack • empowers you to achieve your mission.