Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Underestimated costs of microservice architectures

Underestimated costs of microservice architectures

With many business success stories, our beautiful software systems can degrade into monolithic Big Balls of Mud. And to fix these kinds of monstrosities, we as developers and architects have begun to reach for microservices as our solution. Beautifully-designed architecture diagrams and org charts clearly show the benefits in terms of coordination, batch size, and codebase understandability, but is it really all unicorns and rainbows?

As folks who have been down this road can tell you, microservice architectures don’t solve all our problems. Tradeoffs abound, and in this talk we’ll see the costs we need to be prepared to pay when we introduce microservices, including team dynamics as well as technical tradeoffs around consistency, failure handling, and observability.

Colin Jones

May 02, 2018
Tweet

More Decks by Colin Jones

Other Decks in Technology

Transcript

  1. 8th Light, Inc.
    Colin Jones
    @trptcolin
    https://8thlight.com
    Underestimated costs of
    microservice architectures

    View Slide

  2. Microservices

    View Slide

  3. Happy!

    View Slide

  4. ☠ Sad ☠

    View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. Avoid microservices?

    View Slide

  10. Avoid microservices?

    View Slide

  11. On other hand
    But to gain any benefit from microservice
    thinking, you have to understand what it
    is, how to do it, and why you should
    usually do something else.
    - Martin Fowler

    View Slide

  12. Hype Cycle

    View Slide

  13. Accusations

    View Slide

  14. We underestimate costs

    View Slide

  15. Benefits

    View Slide

  16. independent
    deployability

    View Slide

  17. independent
    scalability
    independent
    deployability

    View Slide

  18. fault tolerance
    independent
    deployability
    independent
    scalability

    View Slide

  19. avoid
    dependency hell
    independent
    deployability
    independent
    scalability
    fault
    tolerance

    View Slide

  20. architectural
    boundaries
    independent
    deployability
    independent
    scalability
    fault
    tolerance
    avoid
    dependency hell

    View Slide

  21. small team
    ownership
    independent
    deployability
    independent
    scalability
    fault
    tolerance
    avoid
    dependency hell
    architectural
    boundaries

    View Slide

  22. eliminate legacy
    code
    independent
    deployability
    independent
    scalability
    fault
    tolerance
    avoid
    dependency hell
    architectural
    boundaries
    small team
    ownership

    View Slide

  23. eliminate
    legacy code
    independent
    deployability
    independent
    scalability
    fault
    tolerance
    avoid
    dependency hell
    architectural
    boundaries
    small team
    ownership



    microservices!

    View Slide

  24. Costs & Mitigations

    View Slide

  25. Well-understood
    costs

    View Slide

  26. Latency

    View Slide

  27. Latency
    Mitigations
    • Cache responses
    • Batch calls together
    • Coarse-grained service API

    View Slide

  28. Additional
    infrastructure
    Latency

    View Slide

  29. Additional infrastructure
    Mitigations
    • Containers (e.g. Docker)
    • Infrastructure automation & configuration
    management
    • Virtual machines / cloud
    • Auto-scaling (metered cost)
    • Serverless

    View Slide

  30. Understanding !=
    Paying

    View Slide

  31. Underestimated
    Costs

    View Slide

  32. Data consistency
    Additional infrastructure
    Latency

    View Slide

  33. Data consistency
    Orders
    Service
    Main
    App
    Main DB

    View Slide

  34. Data consistency
    Orders
    Service
    Main
    App
    Orders DB
    Main DB

    View Slide

  35. Data consistency
    Mitigations
    • Design for eventual consistency
    • Canonical source for data (aka “system of
    record” / “source of truth”) and derived data
    • Backend sync processes
    • Service teams co-own ETL for analytics/
    business intelligence / data warehouse

    View Slide

  36. Data consistency

    View Slide

  37. Failure modes
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  38. Failure modes
    A B

    View Slide

  39. Failure modes
    A B
    ???

    View Slide

  40. Failure modes
    A B

    View Slide

  41. Failure modes
    A B

    View Slide

  42. Failure modes
    A B
    ???

    View Slide

  43. Failure modes
    Mitigations
    • Use retries (with backoff; cap the max time)
    • Read the remote end to see if it succeeded
    • Use fallbacks for read timeouts
    • Use circuit breaker to limit cascading failures
    • Use bulkheads to protect independent modules

    View Slide

  44. Failure modes

    View Slide

  45. Development &
    testing
    Failure modes
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  46. Development & testing

    View Slide

  47. from "Testing Microservice the Sane Way" by Cindy Sridharan
    Development & testing

    View Slide

  48. Mitigations
    • Expand testing mindset to staging/production observability
    efforts
    • Integrate only a few services / API checking and rely more on
    unit tests
    • Unify on lightweight runtimes and single database technology
    [only delays the problem; conflicts with team ownership]
    • Test against external environments with services set up [risks
    test pollution]
    • Orchestrate new isolated infrastructure for each test run
    Development & testing

    View Slide

  49. Observability
    Failure modes
    Development & testing
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  50. Observability

    View Slide

  51. Observability
    Mitigations
    • Log aggregation with correlation IDs
    • Error reporting / alerting [generally on
    symptoms, not causes]
    • Distributed tracing tools
    • Monitoring tools / dashboards
    • Fancier 3rd-party observability tools

    View Slide

  52. Tunnel vision
    Failure modes
    Development & testing
    Observability
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  53. Tunnel vision

    View Slide

  54. Mitigations
    • Measure business metrics, not team velocity
    • Make sure team / service incentives are
    aligned with the company’s
    • Rotations / team exchanges / dynamic re-
    teaming
    • Cross-org communication
    Tunnel vision

    View Slide

  55. Implicit
    connection data
    Failure modes
    Development & testing
    Observability
    Tunnel vision
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  56. Implicit connection data

    View Slide

  57. Mitigations
    • Well known API contracts / specifications (e.g.
    JSON Schema, Swagger, Protocol Buffers, Thrift)
    • Centralized/standardized repository to track service
    metadata for service discovery
    • Put all services in one codebase (monorepo) for
    easier searchability
    • Custom tooling based on log aggregation /
    monitoring
    Implicit connection data

    View Slide

  58. Inter-team
    priority conflicts
    Failure modes
    Development & testing
    Observability
    Tunnel vision
    Implicit connection data
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  59. Inter-team priority conflicts
    Them
    Us

    View Slide

  60. Inter-team priority conflicts
    Them
    Consumer B
    Us (Consumer A) Consumer C Consumer C

    View Slide

  61. Mitigations
    • Make our case really well to the service team, or
    management
    • Add staff on heavily-used microservices
    • Split heavily-used services further
    • Contribute to their project (aka “internal open-
    source”)
    • Rebuild a similar service with the changes we need
    Inter-team priority conflicts

    View Slide

  62. Hard to change
    across boundaries
    Failure modes
    Development & testing
    Observability
    Tunnel vision
    Inter-team priority conflicts
    Implicit connection data
    Data consistency
    Additional infrastructure
    Latency

    View Slide

  63. Hard to change across
    boundaries

    View Slide

  64. Hard to change across
    boundaries

    View Slide

  65. Mitigations
    • Be deliberate about the choice to Extract Microservice
    • Version your API? [controversial]
    • If versioning / breaking: Have a well-defined way to
    communicate breaking changes / deadlines
    • Sticking with the same runtime (e.g. JVM) makes
    Inline Microservice possible
    • Cross-org communication
    Hard to change across
    boundaries

    View Slide

  66. Mitigations
    • Skill and culture of backwards compatibility
    (SemVer, Postel’s Law)
    • Don’t make breaking changes
    • Well known API contracts / specifications (e.g.
    JSON Schema, Swagger, Protocol Buffers, Thrift)
    • Consumer-driven contract tests in CI
    • See also “Connection data is implicit” mitigations
    Hard to change across
    boundaries

    View Slide

  67. Failure modes
    Development & testing
    Observability
    Tunnel vision
    Inter-team priority conflicts
    Implicit connection data
    Data consistency
    Additional infrastructure
    Latency



    Hard to change
    across boundaries
    microservices!

    View Slide

  68. Alternatives

    View Slide

  69. Milliservices?
    Centiservices?

    View Slide

  70. Modules /
    encapsulation

    View Slide

  71. Modules / encapsulation

    View Slide

  72. Modules / encapsulation

    View Slide

  73. Modules / encapsulation

    View Slide

  74. Recommendations

    View Slide

  75. List problems.
    Then solutions.
    Then pros and cons.

    View Slide

  76. Don’t believe
    the hype.

    View Slide

  77. Be ready to
    pay the costs.

    View Slide

  78. Make sure you’re
    getting the benefits.

    View Slide

  79. Make the change easy.
    Then make the change.

    View Slide

  80. Make the change easy.
    Then make the change.
    (maybe)

    View Slide

  81. Learn more

    View Slide

  82. Learn more

    View Slide

  83. Learn more
    •Ben Christensen. “Don’t Build a Distributed Monolith”: https://
    www.microservices.com/talks/dont-build-a-distributed-monolith/
    •Michael Feathers. “Microservices and the Failure of Encapsulation”: https://
    michaelfeathers.silvrback.com/microservices-and-the-failure-of-encapsulaton
    •Michael Feathers. Working Effectively with Legacy Code: https://
    www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/
    0131177052
    •Martin Fowler. Enterprise Application Architecture: https://
    www.martinfowler.com/books/eaa.html
    •Martin Fowler. “MicroservicePremium”: https://www.martinfowler.com/
    bliki/MicroservicePremium.html
    •Martin Fowler. “Microservice Prerequisites”: https://www.martinfowler.com/
    bliki/MicroservicePrerequisites.html
    •Martin Fowler. “Microservices Resource Guide”: https://
    www.martinfowler.com/microservices/

    View Slide

  84. Learn more
    •Susan Fowler. Production-Ready Microservices: http://shop.oreilly.com/product/
    0636920053675.do
    •John Gall. The Systems Bible: https://www.amazon.com/Systems-Bible-Beginners-
    Guide-Large/dp/0961825170
    •David Heinemeier Hansson. “The Majestic Monolith”: https://m.signalvnoise.com/
    the-majestic-monolith-29166d022228
    •Rich Hickey. “Hammock Driven Development”: https://www.youtube.com/watch?
    v=f84n5oFoZBc
    •Gregor Hohpe and Bobby Woolf. Enterprise Integration Patterns: http://
    www.enterpriseintegrationpatterns.com/
    •Mike Knepper. “The Hidden Costs of Leaving a Monolith”: https://8thlight.com/
    blog/mike-knepper/2016/01/20/hidden-costs-of-leaving-a-monolith.html
    •Dan Manges. “The Modular Monolith: Rails Architecture”: https://medium.com/
    @dan_manges/the-modular-monolith-rails-architecture-fb1023826fc4
    •Sam Newman. Building Microservices: https://samnewman.io/books/
    building_microservices/

    View Slide

  85. Learn more
    •Michael Nygard. “The Entity Microservice Antipattern”: http://
    www.michaelnygard.com/blog/2017/12/the-entity-service-antipattern/
    •Michael Nygard. Release It!, 2nd edition: https://pragprog.com/book/mnee2/
    release-it-second-edition
    •Ozan Onay. “You are not Google”: https://blog.bradfieldcs.com/you-are-not-
    google-84912cf44afb
    •Arnon Rotem-Gal-Oz. “Fallacies of Distributed Computing Explained”: http://
    www.rgoarchitects.com/Files/fallacies.pdf
    •Cindy Sridharan. “Testing Microservices, the Sane Way”: https://medium.com/
    @copyconstruct/testing-microservices-the-sane-way-9bb31d158c16
    •Cindy Sridharan. “Testing in Production, the safe way”: https://medium.com/
    @copyconstruct/testing-in-production-the-safe-way-18ca102d0ef1
    •Jim Waldo, Geoff Wyant, Ann Wolrath, and Sam Kendall. “A Note on
    Distributed Computing”: http://web.cs.wpi.edu/~cs3013/a11/Papers/
    Waldo_NoteOnDistributedComputing.pdf

    View Slide

  86. 8th Light, Inc.
    Colin Jones
    @trptcolin
    https://8thlight.com
    Thank you!

    View Slide