Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Chaos and Microservices Mayhem

Cloud Chaos and Microservices Mayhem

The cloud is just someone else's data center, but it has fundamentally changed how we design software and what we expect from our platforms. Our applications have gotten bigger, more distributed, and more complicated, and there are whole new categories of mistakes we can make. Some things that were a good idea ten years ago turn out to be a terrible idea in the cloud; and what used to be ‘good enough’ for testing really isn’t anymore. Managing microservices architecture demands a lot of us, to ensure observability, operational resiliency, and organisational agility. With a focus on Java, this talk will introduce some of the new tools, patterns, and best practices for modern distributed application development. It also gives a tour of some of the most painful anti-patterns Holly has seen as a cloud consultant.

Holly Cummins

October 01, 2022
Tweet

More Decks by Holly Cummins

Other Decks in Programming

Transcript

  1. cloud chaos and
    microservices mayhem




    Holly Cummins


    Red Hat


    @holly_cummins
    Voxxed Athens

    View full-size slide

  2. #RedHat
    @holly_cummins
    things you
    need to do
    well in 2022:


    View full-size slide

  3. #RedHat
    @holly_cummins
    handwashing
    things you
    need to do
    well in 2022:


    View full-size slide

  4. #RedHat
    @holly_cummins
    handwashing
    apps
    things you
    need to do
    well in 2022:


    View full-size slide

  5. #RedHat
    @holly_cummins
    handwashing
    apps
    ops
    things you
    need to do
    well in 2022:


    View full-size slide

  6. #RedHat
    @holly_cummins
    handwashing
    apps
    ops
    devops
    things you
    need to do
    well in 2022:


    View full-size slide

  7. #RedHat
    @holly_cummins
    handwashing
    apps
    ops
    devops
    devsecops
    things you
    need to do
    well in 2022:


    View full-size slide

  8. #RedHat
    @holly_cummins
    handwashing
    apps
    ops
    devops
    devsecops
    finops
    things you
    need to do
    well in 2022:


    View full-size slide

  9. #RedHat
    @holly_cummins
    a few

    View full-size slide

  10. #RedHat
    @holly_cummins

    View full-size slide

  11. #RedHat
    @holly_cummins
    technology changes fast (duh)

    View full-size slide

  12. #RedHat
    @holly_cummins
    technology changes fast (duh)
    … but it’s taking us a while to catch up to cloud

    View full-size slide

  13. #RedHat
    @holly_cummins
    technology changes fast (duh)
    … but it’s taking us a while to catch up to cloud

    View full-size slide

  14. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    tracing

    View full-size slide

  15. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    observability

    View full-size slide

  16. #RedHat
    @holly_cummins
    hardware
    old way
    logs

    View full-size slide

  17. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    container
    logs

    View full-size slide

  18. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs

    View full-size slide

  19. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    oops

    View full-size slide

  20. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    container
    logs

    View full-size slide

  21. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    container
    logs

    View full-size slide

  22. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs

    View full-size slide

  23. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs
    container
    logs
    container
    logs
    container
    logs
    container
    logs

    View full-size slide

  24. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs
    container
    logs
    container
    logs
    container
    logs
    container
    logs

    View full-size slide

  25. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs
    container
    logs
    container
    logs
    container
    logs
    container
    logs
    aggregation + correlation

    View full-size slide

  26. #RedHat
    @holly_cummins
    hardware
    old way cloud way
    logs
    logs
    container
    logs
    container
    logs
    container
    logs
    container
    logs
    aggregation + correlation
    open telemetry
    (open tracing)

    View full-size slide

  27. #RedHat
    @holly_cummins

    View full-size slide

  28. #RedHat
    @holly_cummins
    good news: observability is ‘fixed’

    View full-size slide

  29. #RedHat
    @holly_cummins
    good news: observability is ‘fixed’
    bad news: not all of us are using it

    View full-size slide

  30. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    performance requirements

    View full-size slide

  31. #RedHat
    @holly_cummins
    memory footprint is money


    startup time is money

    View full-size slide

  32. #RedHat
    @holly_cummins
    ahead of time compilation is
    (now) a good idea

    View full-size slide

  33. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    performance optimisation

    View full-size slide

  34. #RedHat
    @holly_cummins

    View full-size slide

  35. #RedHat
    @holly_cummins

    View full-size slide

  36. #RedHat
    @holly_cummins
    the JVM tunes itself differently on servers

    View full-size slide

  37. #RedHat
    @holly_cummins
    the JVM tunes itself differently on servers
    … but what’s a server?

    View full-size slide

  38. #RedHat
    @holly_cummins
    answer(ish):

    View full-size slide

  39. #RedHat
    @holly_cummins
    // This is the working definition of a server class machine:
    // >= 2 physical CPU's and >=2GB of memory
    answer(ish):
    actual hotspot source code

    View full-size slide

  40. #RedHat
    @holly_cummins

    View full-size slide

  41. #RedHat
    @holly_cummins
    takeaway: don’t shrink your containers too
    much or your JVM will make bad decisions
    advice accurate as of June 2022. all performance advice should be independently verified. do not try this at home without testing first. your mileage may vary.

    View full-size slide

  42. #RedHat
    @holly_cummins
    GC: if you chain 100 microservices together,
    you are almost guaranteed a GC pause

    View full-size slide

  43. #RedHat
    @holly_cummins
    wait, isn’t that the
    opposite of the advice in the
    previous section? should my
    containers be big or
    small?

    View full-size slide

  44. #RedHat
    @holly_cummins
    this is the chaos part :)
    wait, isn’t that the
    opposite of the advice in the
    previous section? should my
    containers be big or
    small?

    View full-size slide

  45. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    management

    View full-size slide

  46. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    keeping track of hardware

    View full-size slide

  47. #RedHat
    @holly_cummins
    The cloud makes it so easy to
    provision hardware.

    View full-size slide

  48. #RedHat
    @holly_cummins
    That doesn’t mean
    the hardware is free.

    View full-size slide

  49. #RedHat
    @holly_cummins
    That doesn’t mean the hardware is
    free.

    View full-size slide

  50. #RedHat
    @holly_cummins
    That doesn’t mean the hardware is
    free.
    Or useful.

    View full-size slide

  51. #RedHat
    @holly_cummins
    That doesn’t mean the hardware is
    free.
    Or useful.
    Or shuts itself off.

    View full-size slide

  52. #RedHat
    @holly_cummins
    Hey boss, I created a
    Kubernetes cluster.

    View full-size slide

  53. #RedHat
    @holly_cummins
    Hey boss, I created a
    Kubernetes cluster.
    I forgot it for 2 months.

    View full-size slide

  54. #RedHat
    @holly_cummins
    Hey boss, I created a
    Kubernetes cluster.
    I forgot it for 2 months.
    … and it’s €1000 a month.

    View full-size slide

  55. Photo by daveynin, flickr

    View full-size slide

  56. #RedHat
    @holly_cummins
    2017 survey


    25%


    of 16,000 servers
    doing no useful work

    View full-size slide

  57. #RedHat
    @holly_cummins
    2017 survey


    25%


    of 16,000 servers
    doing no useful work
    “perhaps someone
    forgot to turn them off”

    View full-size slide

  58. #RedHat
    @holly_cummins
    finops


    figuring out who in
    your company forgot
    to turn off their cloud

    View full-size slide

  59. #RedHat
    @holly_cummins
    companies that went out of
    business for cloud cost accidents

    View full-size slide

  60. @holly_cummins #RedHat

    View full-size slide

  61. #RedHat
    @holly_cummins

    View full-size slide

  62. #RedHat
    @holly_cummins

    View full-size slide

  63. #RedHat
    @holly_cummins
    $11,448.30

    View full-size slide

  64. #RedHat
    @holly_cummins
    “I did find a setting on your plan that
    set the max cacheable file size to
    15GB and it looks like your zipfile is
    18GB big”
    $11,448.30

    View full-size slide

  65. #RedHat
    @holly_cummins

    View full-size slide

  66. #RedHat
    @holly_cummins
    maybe this is the chaos part

    View full-size slide

  67. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    releasing

    View full-size slide

  68. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    releasing

    View full-size slide

  69. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    microservices

    View full-size slide

  70. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    microservices

    View full-size slide

  71. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins

    View full-size slide

  72. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “we’re moving too slowly.

    View full-size slide

  73. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “we’re moving too slowly.
    we should modernise our COBOL
    application into microservices.

    View full-size slide

  74. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “we’re moving too slowly.
    we should modernise our COBOL
    application into microservices.
    … but our release board only meets twice a year.”

    View full-size slide

  75. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    modularity

    View full-size slide

  76. #RedHat
    @holly_cummins
    microservices are not the goal

    View full-size slide

  77. #RedHat
    @holly_cummins
    microservices are not the goal
    they are the means

    View full-size slide

  78. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    wishful mimicry

    View full-size slide

  79. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    cloud != microservices

    View full-size slide

  80. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    cloud native != microservices

    View full-size slide

  81. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “every time we touch one
    microservice, all the others break.”

    View full-size slide

  82. #RedHat
    @holly_cummins
    distributed monolith

    View full-size slide

  83. #RedHat
    @holly_cummins
    distributed monolith
    but without compile-time checking


    … or guaranteed function execution

    View full-size slide

  84. #RedHat
    @holly_cummins
    distributed monolith
    but without compile-time checking


    … or guaranteed function execution

    View full-size slide

  85. #RedHat
    @holly_cummins
    distributed monolith
    but without compile-time checking


    … or guaranteed function execution

    View full-size slide

  86. #RedHat
    @holly_cummins
    distributed monolith
    but without compile-time checking


    … or guaranteed function execution

    View full-size slide

  87. #RedHat
    @holly_cummins
    “each of our microservices has duplicated the
    same object model … with twenty classes and
    seventy fields”

    View full-size slide

  88. #RedHat
    @holly_cummins
    Microservice
    Domain

    View full-size slide

  89. #RedHat
    @holly_cummins
    Microservice
    Domain

    View full-size slide

  90. #RedHat
    @holly_cummins
    Microservice
    Domain
    (this is bad)

    View full-size slide

  91. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    distributed != decoupled

    View full-size slide

  92. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “ohhhh, we weren’t expecting your
    service to do that … “

    View full-size slide

  93. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    “uh, what do you mean you
    corrected the typo in your json?”

    View full-size slide

  94. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    If you’re going to do microservices,
    you need to get good at automation.


    And testing.

    View full-size slide

  95. IBM Cloud © 2020 IBM Corporation

    View full-size slide

  96. IBM Cloud © 2020 IBM Corporation
    the test
    pyramid

    View full-size slide

  97. IBM Cloud © 2020 IBM Corporation
    the test
    pyramid end-to-end tests

    View full-size slide

  98. IBM Cloud © 2020 IBM Corporation
    the test
    pyramid end-to-end tests
    unit tests

    View full-size slide

  99. IBM Cloud © 2020 IBM Corporation
    the test
    pyramid end-to-end tests
    unit tests
    contract tests

    View full-size slide

  100. IBM Cloud © 2020 IBM Corporation
    the test
    pyramid
    (you can TDD
    at every level!)
    end-to-end tests
    unit tests
    contract tests

    View full-size slide

  101. #RedHat
    @holly_cummins
    How to test a fire alarm?

    View full-size slide

  102. IBM Cloud © 2020 IBM Corporation
    how not to test a fire alarm

    View full-size slide

  103. IBM Cloud © 2020 IBM Corporation
    how not to test a fire alarm

    View full-size slide

  104. #RedHat
    @holly_cummins

    View full-size slide

  105. #RedHat
    @holly_cummins
    unit testing a fire
    alarm

    View full-size slide

  106. #RedHat
    @holly_cummins
    uh … is that enough?

    View full-size slide

  107. #RedHat
    @holly_cummins
    contract testing a
    fire alarm

    View full-size slide

  108. #RedHat
    @holly_cummins
    contract testing a
    fire alarm

    View full-size slide

  109. #IBMGarage + IBM Cloud © 2020 IBM Corporation
    @holly_cummins
    to the code!

    View full-size slide

  110. #RedHat
    @holly_cummins
    demo recap:


    https://github.com/holly-cummins/house-of-microservices-quarkus-contract-testing-sample

    View full-size slide

  111. #RedHat
    @holly_cummins
    demo recap:


    https://github.com/holly-cummins/house-of-microservices-quarkus-contract-testing-sample

    View full-size slide

  112. #RedHat
    @holly_cummins
    demo recap:


    • consumer-driven contract testing can save your bacon


    • pact is a mock for the consumer


    • pact is a functional test for the producer


    • shared json contracts aligns expectations across services

    View full-size slide

  113. thank you


    @holly_cummins

    View full-size slide