Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WTF is a Microservice - Rafael Schloming, Datawire

datawire
February 10, 2017

WTF is a Microservice - Rafael Schloming, Datawire

Rafael Schloming, Chief Architect at Datawire and AMQP spec author breaks down an understanding of microservices into People, Processes, and Technology, and when adopting microservices recommends starting with People first, rather than starting with Technology.

datawire

February 10, 2017
Tweet

More Decks by datawire

Other Decks in Technology

Transcript

  1. datawire.io History Datawire • Founded in 2014 • Focused on

    microservices Me • Lots of distributed systems experience • Starting from zero with microservices 2
  2. datawire.io What is a microservice? Wikipedia: “...no industry consensus…” •

    “...implementation approach for SoA…” • “...processes that communicate with each other to fulfill a goal…” • “...Naturally enforces a modular structure...” Everything else: • Volumes of essays… good, bad, and ugly... 3
  3. datawire.io Starting Point Technical: • An application composed of a

    network of small services • Building your application from microservices forces you to create clear boundaries, better abstractions, ... Process: • ??? People: • ??? 6
  4. datawire.io The Expert Source Read just about every firsthand story

    out there Went to conferences Talked to everyone we could Started the practitioner summit And armed with a little bit of knowledge, we started filling in our picture… 7
  5. datawire.io Technical Picture Control Plane • Service Discovery • Logging

    + Metrics • Configuration • Smart Endpoints Traffic Layer • HTTP • RPC • Messaging 9 Reference Architecture
  6. datawire.io First Picture Technical: • A network of small services

    • Connected via a control plane and traffic layer Process: • ??? People: • Platform team and service teams 10
  7. datawire.io The Bootstrap Perspective Five engineers building an out of

    the box control plane... Ingest interesting application level events: • start, stop, heartbeat events • log messages Store them in an appropriate piece of infrastructure: • Service registry • Log store Transform and Present: • Realtime view of: routing table, service health • Historic view of: request traces, ... 11
  8. datawire.io Ubiquitous Data Processing Pipeline 12 Ingest Source of Truth

    Transform Present Template for many data driven businesses…
  9. datawire.io V1: Started with Discovery Requirements: • highly available •

    low throughput • low latency • low operational complexity • able to survive a complete restart • capable of handling spikes Initial Choices: • vert.x + hazelcast • websockets • smart clients • auth0 + python shim Total Services: 2 13
  10. datawire.io V2: Added Tracing (PoC) Requirements: • high throughput •

    highish latency ok • cannot impact application Initial choices: • vert.x, hazelcast (only retained transient buffer of last 1000 log messages) • websockets • smart circular buffer minimized impact on application Total Services: 3 14
  11. datawire.io V3: Added Persistence for Tracing Requirements: • keep extended

    history • provide full text search • filtering, sorting, etc Initial Choices: • elasticsearch for storage/search • query service Total Services: 4 15
  12. datawire.io First hint of pain... Rerouting data pathways: • touched

    multiple services • coupled changes Poor local dev experience: • manually fire up and wire the whole fabric Slow deployment pipeline: • bunched up changes All this resulted in a big scary cutover 16
  13. datawire.io V4: Adding Persistence for Discovery Requirements: • track errors

    associated with particular service nodes • store routing strategies Initial Choices: • postgres (RDS) for persistence Yet another big cutover… enough is enough! Let’s fix our tooling once and for all... 17
  14. datawire.io Deployment Requirements Stuff we had tried: • Deliver everything

    as a docker image ◦ Still too much wiring to bootstrap the system • Use kubernetes for everything ◦ Nice dev experience with minikube, but we use amazon services Need to meet both dev & operational requirements… • Fast dev cycle • Good visibility • Fast rollback • Ability to leverage commodity services 18
  15. datawire.io Deployment Redesign • Complete system definition in git ◦

    Contains all the information necessary to bootstrap the system from scratch in all of its operating environments… • System definition is well factored with respect to its environments… ◦ Abstract definition: “my service needs postgres and redis” ◦ Development: service -> docker image, postgres -> docker image, redis -> docker image ▪ Use minikube to run the whole system ◦ Test: <same as dev for now> ◦ Production: service -> docker image, postgres -> RDS, redis -> elasticache ▪ Kubernetes cluster for stateless services • Tooling caters to the needs of each environment ◦ Development: fast feedback cycle ◦ Test: repeatable environments ◦ Production: quick and safe updates/rollbacks • Tooling helps maintain environment parity 19
  16. datawire.io DevOps? DevOps is presented as a solution to an

    organizational problem, but we all sat in the same room… We were thinking about operational factors from day one: • throughput, latency, availability, … • building a service, not a server This forced us to follow an incremental process: • tooling for this process was inadequate • when we thought about the process it helped us figure out the tooling 20
  17. datawire.io Process: Architecture vs Development (SoA vs SoD) Systems (their

    shape in particular) are traditionally architected Architecture • lots of up front thinking • slow feedback cycle Development • frequent small changes • quick feedback cycle • measure the impact at every step Microservices are about enabling a developmental methodology for systems 21
  18. datawire.io Methodology for Developing Systems Principles • small frequent changes

    • rapid feedback and good visibility Applied to codebases: • Tooling for rapid feedback: compilers, incremental builds, test suites • Tooling for good visibility: printf, logging, debuggers, profilers Applied to systems: • Key characteristics go beyond just logic and correctness • Performance within specified tolerance of the running system is a critical feature Tests don’t cut it anymore... 22
  19. datawire.io Update the Dev Cycle Tests assess impact on correctness...

    Build -> Test -> Deploy We need a way to assess impact on the system… Build -> Test -> Assess Impact -> Deploy How do you measure system level impact? • Measure impact against defined Service Level Objectives (SLOs): ◦ throughput, latency, and availability (error rate) 23
  20. datawire.io Back to the Experts... • Canary Testing • Circuit

    Breakers • Dark Launching • Tracing • Metrics • Deployment All ways to enable the dev cycle for running systems: • make small frequent changes • measure the impact on the running system • provide good visibility 24
  21. datawire.io Second Picture Technical: • A network of small services

    • Scaffolding to safely enable small frequent changes Process: • Service oriented Development • Small frequent changes with good visibility and feedback People: • Platform team and service teams 25
  22. datawire.io The Migration Perspective Variety of stages... • Monolith: django,

    rails, ... • Monolith++: mothership + several little ducklings • SoA-ish: small flock of services (maybe 5-10) • Inbetweeners… Some moving really slowly... • Months to create just one microservice… Some moving much faster… • What’s the difference? 26
  23. datawire.io Migration is about people Starting point: team vs tech

    • Picking a tech stack for the entire eng org to adopt is slow ◦ lots of organizational friction • Replatforming/refactoring an entire existing monolith is slow ◦ lots of organizational and orchestrational friction • Creating a relatively autonomous team to tackle a particular problem in the form of a service Growing pains: stability vs progress • some orgs hit a sticking point, some didn’t 27
  24. datawire.io The People Picture: Dividing up the Work The work

    has two aspects: • build the features (dev) • keeping the system running (ops) You can’t usefully divide up the work along these lines: • new features are the biggest source of instability (bugs) • separate roles creates misaligned incentives ⇒ (devops) • yet a big part of the work is keeping things running Microservices is about how to go about dividing up work: • break the big app into smaller ones • divide operational responsibility in a way that aligns incentives 28
  25. datawire.io Third Picture Technical: • A network of small services

    • Scaffolding to quickly and safely enable small frequent changes Process: • Service oriented Development • Small frequent changes with good visibility and feedback People: • Dividing up the work • Service teams deliver features to users • Platform team supports service teams 29
  26. datawire.io The Hard Way 30 1. Start with Tech 2.

    Reverse Engineer The Process + People 3. Make lots of mistakes along the way 4. Learn from them
  27. datawire.io The Easy Way 31 1. Understand the principles of

    People and Process 2. Use this as a framework to a. pick tech that fits b. learn from other people's mistakes
  28. datawire.io Microservices Cheat Sheet (What, Why & How) People Process

    Technology Microservices are a way to divide the work of building a cloud application Microservices are built from a process of frequent small changes with rapid feedback and good visibility Microservices are an application that is made up of a network of small services This work falls into two categories: • Keep the system running (ops) • Build new features (dev). Dividing work along these categories creates conflicting incentives between progress and stability. New features from dev eventually become the biggest source of instability for ops. Unifying these roles (devops) allows you to minimize the tradeoff between progress and stability, but you now need to divide up the work by dividing up the app. This results in a network of services. This is the application of the traditional dev cycle to systems rather than codebases, and for it to work, key system properties must become a first class features for developers. This requires dev tooling to support quickly and safely assessing system impact. This requires fast deployment tooling and good visibility into key system level properties: • Throughput • Latency • Availability (error-rate) Depending on your system, this may require tooling for: • Fancy request routing (for canary testing, dark launching) Give your dev teams operational responsibility! Define service level objectives & agreements for each service: SLOs: throughput, latency, availability SLAs: what happens when these aren’t met Commoditize common operational overhead. Extend the dev cycle to include a stage to assess the impact on key system properties (SLOs) Build -> Test -> Deploy ⇒ Build -> Test -> Assess Impact -> Deploy Start with a fast deployment pipeline that incorporates basic system level metrics and monitoring for each service. 32
  29. datawire.io Microservices Cheat Sheet (What, Why & How) People Microservices

    are a way to divide the work of building a cloud application Two aspects of work: keep it running (ops), build new features (dev) Dividing by aspect creates conflicting incentives between progress and stability. Unifying roles (devops) to minimize tradeoff... divide work by dividing the app Give your dev teams operational responsibility! Define service level objectives & agreements for each service: SLOs: throughput, latency, availability SLAs: what happens when these aren’t met Commoditize common operational overhead. 34
  30. datawire.io Microservices Cheat Sheet (What, Why & How) Process Microservices

    are built from a process of frequent small changes with rapid feedback and good visibility This is the application of the traditional dev cycle to systems rather than codebases, and for it to work, key system properties must become a first class features for developers. Extend the dev cycle to include a stage to assess the impact on key system properties (SLOs) Build -> Test -> Deploy ⇒ Build -> Test -> Assess Impact -> Deploy 35
  31. datawire.io Microservices Cheat Sheet (What, Why & How) Technology Microservices

    are an application that is made up of a network of small services This requires dev tooling to support quickly and safely assessing system impact. This requires fast deployment tooling and good visibility into key system level properties: • Throughput • Latency • Availability (error-rate) Depending on your system, this may require tooling for: • Fancy request routing (for canary testing, dark launching) Start with a fast deployment pipeline that incorporates basic system level metrics and monitoring for each service. 36
  32. datawire.io Dividing up Work 40 Dev Dev Dev Dev Dev

    Dev Dev Infra User User User User Ops