Velocity San Jose 2017 - Database Reliability Engineering

database reliability engineering what. why. how. O’Reilly Velocity Santa Clara,
2017 Laine Campbell, Sr. Dir, SRE, Fastly Charity Majors, Founder/CEO honeycomb.io @mipsytipsy @lainevcampbell

we made a book! (2:50 early release signing)

yesterday’s DBA 3 gatekeeper master builder superhero siloed specialized

today’s DBRE 4 educator and mentor platform builder force multiplier
operations expert t-shaped

database reliability engineering 5 develop reliable and resilient datastores and
shared components for provisioning Provide patterns and knowledge to support other team’s processes to facilitate their work understand and teach data access and storage nuances to ensure all service level objectives can be met anchor teams with expertise for troubleshooting, recovery and other tasks requiring depth, not breadth

why DBRE? 6 reliability is only a smidgen of the
operational mandate reliability isn’t always required in operations the closer to the data, the more reliability is necessary data requires paranoia, not chaos - @jessietron - https://blog.codeship.com/growing-tech-stack-say-no/

the path to DBRE 7 Who's your DBA? If you
don’t know the answer, it’s probably you. the DBA, continuing to evolve their skillset the ops engineer, taking data ownership software engineers developing data driven applications

8 paradigm shifts

polyglot persistence 9 relational is not the end of the
line new and shiny brings risk One engine is less likely to suit all cases we vet, find edge cases, learn patterns Integration between engines is crucial The data must flow

virtualization and cloud 10 commodity hardware for DBs forces designing
for resilience new storage paradigms the power of rapid provisioning DBaaS availability reduces toil

infrastructure as code 11 forces componentization no special snowflakes requires
coding chops changes become deployments Infrastructure can be versioned

continuous delivery 12 we must be teachers, not gatekeepers testing
and compliance become top priorities we build patterns to replace reviews build guardrails and tools

database engineers: O.G. devops 13 DEV OPS DBE shared goals,
tools and processes

dbre guiding principles 14 • protect the data • self-service
for scale • elimination of toil • databases are not special snowflakes • reduce the barriers between software and operations

protect the data 15 • responsibility of data protection shared
by cross-functional teams. • standardization and automation with simplicity over expensive/complicated infrastructures. • durability and integrity baked into every part of the architecture and software development lifecycle

self service for scale 16 • metrics knowledge base and
automated discovery/collection • backup/recovery utilities and APIs auto-deployed for new builds • reference architectures and configs for data stores deployment. • security standards for data store deployments. • safe deployment patterns and tests for database changesets

dbs are not special snowflakes 17 • moving databases into
the “cattle, not pets” paradigm a la Bill Baker • databases are the last holdouts of commoditization • keep it simple stupid

18 reduce barriers between teams • Use the same code
review and deploy processes • Use the same provisioning and config management • Data-land should feel familiar and intuitive, not alienating or “special”

dbre core competencies 19

20 choose boring tech if possible testing consistency guarantees benchmarking
performance building gold configurations testing operability datastore decision making

21 Spend innovation tokens only on key differentiators If your
company is very young, optimize for velocity and developer productivity. The more mature your company is, the more operational impact over the long term trumps all.

recoverability 22 build an effective tiered backup solutions - the
right backup for the right dataset work with SWE to build data validation pipelines integrate recovery into daily activities build recovery testing automation

data distribution 24 determine effective number of copies distribute broadly
based on requirements replication, partitioning and sharding

failover and availability 25 understand availability strategies for each datastore
Evaluate limitations and consequences of failover (cold caches, data loss etc…) practice and document failover in controlled circumstances continued training of all staff on the process to get it in “muscle memory”

scaling paths 26 plan for and test up to 10x
growth keep an eye for constraints that will impact the future (aka don’t screw your future self)

build guard rails 27 • backups, restores. verified. • unit
tests • migration and fallback testing • DDL migration patterns • migration heuristic analysis • boring failovers • shared on call rotations • network isolation between prod & other • don’t get pissy when people mess up, help them fix it • don’t swoop in and do it all yourself

automation 28 • rolling DDL • scaling up or down
• failovers and traffic shifts • detecting or killing bad queries

build for services 29 • observability • instrumentation • graceful
degradation

build for people 30 • tools for debugging • checklists,
documentation • tractable levels of graphs and alerting

release management 31 • develop patterns for migrations • heuristics
to surface risky changes • migration and fallback testing • tiered dataset sizes for testing (development, integration, full) • continued developer training and education

33 self-actualized databases • your data is safe and recoverable
• resilient to common errors • understandable, debuggable • shares processes and tooling with the rest of the stack • empowers you to achieve your mission.

34 In conclusion... Don’t be scared. Mind your damn backups.
Everything else will probably be ok.

Velocity San Jose 2017 - Database Reliability E...

Velocity San Jose 2017 - Database Reliability Engineering

More Decks by Laine Campbell

Other Decks in Technology

Featured

Transcript