Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Testing From The Ground Up

Performance Testing From The Ground Up

(Talk given at https://devopsconference.de)

When systems are comprised of many distributed components, each with its own performance and reliability characteristics, and when a misconfiguration that happens to cause a cascading failure under load can be automatically deployed across environments all the way from dev to production in a matter of hours, you really need to make sure that a rigorous, well-understood, and easy-to-follow performance testing process is in place. This talk looks at what performance testing is and how it fits into a delivery process, and how an effective performance testing process can be implemented from the ground up with the open-source Artillery.io toolkit.

hassy veldstra

December 05, 2018
Tweet

More Decks by hassy veldstra

Other Decks in Programming

Transcript

  1. Effective Performance Testing From The Ground Up When performance matters

    Hassy Veldstra h@artillery.io - @hveldstra DevOpsCon, Dec 2018, Munich
  2. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) @hveldstra
  3. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) • Profiling code to make it run faster, use less memory etc @hveldstra
  4. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) • Profiling code to make it run faster, use less memory etc • Benchmarking code @hveldstra
  5. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved @hveldstra
  6. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved • Understanding the various pieces that make up a performance testing strategy @hveldstra
  7. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved • Understanding the various pieces that make up a performance testing strategy • Understanding of what an effective performance testing approach may look like @hveldstra
  8. Questions, questions How do we define performance goals & SLOs*

    for our services? Can we have those checked automatically in CI? @hveldstra
  9. SLOs Service Level Objectives, e.g.: • Support up to 2000

    TPS • 99% of all requests should be served in under 200ms • No more than 0.1% of requests should return a 5×x response https://landing.google.com/sre/sre-book/chapters/service-level-objectives/ For a detailed discussion, see Google SRE book: @hveldstra
  10. Questions Questions What types of performance tests are there? Which

    ones do we need for our services? @hveldstra
  11. Questions Questions How do we organize our test suites? What

    are the best practices around structuring those tests? @hveldstra
  12. Questions Questions How do we pick which services to test?

    Do we need to test all of our microservices? What about testing different environments and configurations easily? @hveldstra
  13. Questions Questions How do we encourage collaboration on performance testing

    and involve everyone: developers, testers, SREs and product managers? (performance is everyone’s responsibility!) @hveldstra
  14. Maybe some answers How to go from zero to a

    performance testing suite that’s: @hveldstra
  15. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable @hveldstra
  16. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines @hveldstra
  17. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically @hveldstra
  18. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically • Works for developers, testers and SREs @hveldstra
  19. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically • Works for developers, testers and SREs • Integrates with monitoring & reporting tools @hveldstra
  20. Maybe some answers @hveldstra • Based on projects & teams

    I’ve seen and questions typically raised • YMVV
  21. Definitions • Testing can suffer from lack of precision in

    terminology • What is “testing”? Several different dimensions to consider @hveldstra
  22. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing @hveldstra
  23. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing 2. Confirms that your understanding of a Thing or its behavior is still correct @hveldstra
  24. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing 2. Confirms that your understanding of a Thing or its behavior is still correct 3. Increases your understanding of a Thing or its properties @hveldstra
  25. Increase confidence • A/B tests • Canarying • Traffic replay

    • Load test to add traffic above base level @hveldstra
  26. Confirm understanding • Unit tests - known good/bad input, known

    good/bad output • Contract-based / property-based testing • Load test on a known configuration that verifies some metrics afterwards (max response time <500ms) @hveldstra
  27. Increase understanding • Sprint 100m and take a heart rate

    reading • Try opening 10k concurrent connections and see what happens • Exploratory testing of all kinds • Chaos testing @hveldstra
  28. Performance testing? any test that tests some performance-related property of

    a Thing. Often used interchangeably with “load testing”. @hveldstra
  29. Or a combined API • Authenticate and get a token

    • Use the token to get geofiltering info @hveldstra
  30. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod @hveldstra
  31. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual @hveldstra
  32. Capacity Planning • Identify amount of resources an instance of

    a service needs • Identify the number of instances needed to meet a performance target @hveldstra
  33. Capacity Planning • Identify amount of resources an instance of

    a service needs • Identify the number of instances needed to meet a performance target • Identify whether current capacity is sufficient or some limits will need to be raised pre-prod @hveldstra
  34. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual @hveldstra
  35. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual • Stress test a service to find its limits & understand how it degrades — pre-prod, manual @hveldstra
  36. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual • Stress test a service to find its limits & understand how it degrades — pre-prod, manual • In prod, manual or automatic — add extra traffic as a safety margin; a form of chaos testing @hveldstra
  37. What types of performance tests are there? • Soak test

    - a load test that runs for a longer period of time (1-2 hours)
  38. What types of performance tests are there? • Soak test

    - a load test that runs for a longer period of time (1-2 hours) • Spike test - a load test that ramps up load very quickly
  39. Acceptance/functional Testing • Verify that a service (or another unit!)

    conforms to its contract • Verify that the results it produces make sense @hveldstra
  40. Auth service • Produces JWTs and not just empty 2×x

    responses • The token contains expected fields @hveldstra
  41. • It’s possible to re-use the same test code for

    both acceptance tests and load tests if your tooling supports it @hveldstra
  42. • It’s possible to re-use the same test code for

    both acceptance tests and load tests if your tooling supports it • (Artillery does!) @hveldstra
  43. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run @hveldstra
  44. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run • Is anything obviously broken? If we plug this thing in and turn it on, is there any smoke? @hveldstra
  45. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run • Is anything obviously broken? If we plug this thing in and turn it on, is there any smoke? • Run a quick happy-path test case @hveldstra
  46. Early “write code” stage: help profile code or dependencies The

    rest typically is on a deployed service or API @hveldstra
  47. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. @hveldstra
  48. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. • This can act as a stage in other pipelines @hveldstra
  49. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. • This can act as a stage in other pipelines • … or be used manually by a dev/tester for ad-hoc testing via a web UI or a CLI (e.g. on Jenkins or AWS CodeBuild) @hveldstra
  50. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself @hveldstra
  51. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure @hveldstra
  52. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions @hveldstra
  53. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions • Critical for testing internal microservices @hveldstra
  54. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions • Critical for testing internal microservices • More cost-effective too @hveldstra
  55. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach @hveldstra
  56. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services @hveldstra
  57. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services • Run on a schedule (e.g. nightly) @hveldstra
  58. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services • Run on a schedule (e.g. nightly) • Run before a final promotion of a change to prod @hveldstra
  59. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist @hveldstra
  60. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Use & adapt: https://github.com/SkeltonThatcher/run- book-template @hveldstra
  61. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) @hveldstra
  62. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) • Involve other teams that may rely on the service! @hveldstra
  63. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) • Involve other teams that may rely on the service! • Better to have some SLOs (& revise) than none at all @hveldstra
  64. Which services should we test? • What could be tested?

    Anything with an API spec. @hveldstra
  65. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services @hveldstra
  66. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services • Composite APIs (that’s a “unit” with its own properties & behavior) @hveldstra
  67. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services • Composite APIs (that’s a “unit” with its own properties & behavior) • Does a microservice have SLOs? Then it should have a performance test to verify those automatically. @hveldstra
  68. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance @hveldstra
  69. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance • Remove barriers: @hveldstra
  70. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance • Remove barriers: • Use tools that everyone has access to • Use tools that everyone can work with @hveldstra
  71. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source @hveldstra
  72. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with @hveldstra
  73. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with • Uses a language that everyone is familiar with @hveldstra
  74. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with • Uses a language that everyone is familiar with • Make load testing reports & findings available to everyone (e.g. on Confluence/your wiki or KB) @hveldstra
  75. Artillery 101 • Available on npm: npm install -g artillery

    • The artillery CLI is used to run tests and create HTML reports @hveldstra
  76. Artillery 101 • Available on npm: npm install -g artillery

    • The artillery CLI is used to run tests and create HTML reports • Tests are written in YAML and can be extended with Javascript @hveldstra
  77. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc @hveldstra
  78. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc • Supports plugins. Out of the box: Statsd/Datadog/ Librato integration. @hveldstra
  79. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc • Supports plugins. Out of the box: Statsd/Datadog/ Librato integration. • Third-party plugins for other monitoring systems @hveldstra
  80. Artillery 101 • Designed to allow for complex, multi-step virtual

    user behavior to be scripted • Support for randomizing requests and capturing data from responses and re-using it in other requests @hveldstra
  81. Artillery 101 • Designed to allow for complex, multi-step virtual

    user behavior to be scripted • Support for randomizing requests and capturing data from responses and re-using it in other requests • Supports assertions on metrics such as HTTP latency — ie automated checking of SLOs @hveldstra
  82. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline @hveldstra
  83. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline • Can generate self-contained HTML reports with charts and graphs @hveldstra
  84. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline • Can generate self-contained HTML reports with charts and graphs • Features for creating modular test suites @hveldstra
  85. Config config: target: "" # we don't set a target

    by default environments: dev: target: "https://auth-service-dev.acme-corp.internal" defaults: headers: x-api-key: "0xcoffee" local: target: "http://localhost:8080" processor: “./functions.js" plugins: datadog: {} payload: - path: "./username-password.csv" fields: - username - password @hveldstra
  86. One or more scenarios scenarios: - name: Authenticate with valid

    credentials flow: - post: url: "/auth" json: username: "{{ username }}" password: "{{ password }}" expect: - statusCode: 200 - contentType: json @hveldstra
  87. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain @hveldstra
  88. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams @hveldstra
  89. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams • Helps code reuse @hveldstra
  90. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams • Helps code reuse • Easier to work with in CI/CD pipelines @hveldstra
  91. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  92. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  93. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  94. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  95. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  96. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  97. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one @hveldstra
  98. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs @hveldstra
  99. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs • allows for service-specific custom code, e.g. to generate random data in a certain format @hveldstra
  100. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs • allows for service-specific custom code, e.g. to generate random data in a certain format • Can encode service-specific load phases and SLOs @hveldstra
  101. { "config": { "phases": [ { "duration": 120, "arrivalRate": 10,

    "rampTo": 20, "name": "Warm up the service" }, { "duration": 240, "arrivalRate": 20, "rampTo": 100, "name": "Ramp to high load" }, { "duration": 600, "arrivalRate": 100, "name": "Sustained high load" } ], "ensure": { "maxErrorRate": 0.1, "p99": 200 } } } @hveldstra
  102. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml @hveldstra
  103. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml @hveldstra
  104. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml Service name, load/SLO override, environment and optionally a scenario → generic CI job @hveldstra
  105. Running a test • Reusable by other jobs / pipeline

    stages • Or via the UI for ad hoc testing - e.g. in Jenkins or AWS CodeBuild @hveldstra
  106. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  107. Where to start? • Pick one service • Write tests

    using the template & set up a CI job to run them @hveldstra
  108. Where to start? • Pick one service • Write tests

    using the template & set up a CI job to run them • Show & tell to the rest of the team @hveldstra
  109. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or @hveldstra
  110. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or @hveldstra
  111. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or • Has high performance requirements @hveldstra
  112. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or • Has high performance requirements • For example: an authentication service @hveldstra
  113. So… we’ve looked at • What performance testing is, and

    different types of performance tests @hveldstra
  114. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process @hveldstra
  115. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines @hveldstra
  116. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines • Setting and verifying SLOs @hveldstra
  117. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines • Setting and verifying SLOs • The mechanics of setting up a test suite with Artillery @hveldstra