Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Testing From The Ground Up

Performance Testing From The Ground Up

(Talk given at https://devopsconference.de)

When systems are comprised of many distributed components, each with its own performance and reliability characteristics, and when a misconfiguration that happens to cause a cascading failure under load can be automatically deployed across environments all the way from dev to production in a matter of hours, you really need to make sure that a rigorous, well-understood, and easy-to-follow performance testing process is in place. This talk looks at what performance testing is and how it fits into a delivery process, and how an effective performance testing process can be implemented from the ground up with the open-source Artillery.io toolkit.

hassy veldstra

December 05, 2018
Tweet

More Decks by hassy veldstra

Other Decks in Programming

Transcript

  1. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) @hveldstra
  2. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) • Profiling code to make it run faster, use less memory etc @hveldstra
  3. What this talk is not about • Writing performant code

    (algorithms, optimization techniques) • Profiling code to make it run faster, use less memory etc • Benchmarking code @hveldstra
  4. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved @hveldstra
  5. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved • Understanding the various pieces that make up a performance testing strategy @hveldstra
  6. What this talk is about • Understanding where performance testing

    fits into delivery process • The people/roles involved • Understanding the various pieces that make up a performance testing strategy • Understanding of what an effective performance testing approach may look like @hveldstra
  7. Questions, questions How do we define performance goals & SLOs*

    for our services? Can we have those checked automatically in CI? @hveldstra
  8. SLOs Service Level Objectives, e.g.: • Support up to 2000

    TPS • 99% of all requests should be served in under 200ms • No more than 0.1% of requests should return a 5×x response https://landing.google.com/sre/sre-book/chapters/service-level-objectives/ For a detailed discussion, see Google SRE book: @hveldstra
  9. Questions Questions What types of performance tests are there? Which

    ones do we need for our services? @hveldstra
  10. Questions Questions How do we organize our test suites? What

    are the best practices around structuring those tests? @hveldstra
  11. Questions Questions How do we pick which services to test?

    Do we need to test all of our microservices? What about testing different environments and configurations easily? @hveldstra
  12. Questions Questions How do we encourage collaboration on performance testing

    and involve everyone: developers, testers, SREs and product managers? (performance is everyone’s responsibility!) @hveldstra
  13. Maybe some answers How to go from zero to a

    performance testing suite that’s: @hveldstra
  14. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable @hveldstra
  15. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines @hveldstra
  16. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically @hveldstra
  17. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically • Works for developers, testers and SREs @hveldstra
  18. Maybe some answers How to go from zero to a

    performance testing suite that’s: • Extensible & maintainable • Integrates into your CI/CD pipelines • Helps verify SLOs automatically • Works for developers, testers and SREs • Integrates with monitoring & reporting tools @hveldstra
  19. Maybe some answers @hveldstra • Based on projects & teams

    I’ve seen and questions typically raised • YMVV
  20. Definitions • Testing can suffer from lack of precision in

    terminology • What is “testing”? Several different dimensions to consider @hveldstra
  21. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing @hveldstra
  22. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing 2. Confirms that your understanding of a Thing or its behavior is still correct @hveldstra
  23. What is a test? An activity, automated or manual, which:

    1. Increases your confidence in some Thing 2. Confirms that your understanding of a Thing or its behavior is still correct 3. Increases your understanding of a Thing or its properties @hveldstra
  24. Increase confidence • A/B tests • Canarying • Traffic replay

    • Load test to add traffic above base level @hveldstra
  25. Confirm understanding • Unit tests - known good/bad input, known

    good/bad output • Contract-based / property-based testing • Load test on a known configuration that verifies some metrics afterwards (max response time <500ms) @hveldstra
  26. Increase understanding • Sprint 100m and take a heart rate

    reading • Try opening 10k concurrent connections and see what happens • Exploratory testing of all kinds • Chaos testing @hveldstra
  27. Performance testing? any test that tests some performance-related property of

    a Thing. Often used interchangeably with “load testing”. @hveldstra
  28. Or a combined API • Authenticate and get a token

    • Use the token to get geofiltering info @hveldstra
  29. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod @hveldstra
  30. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual @hveldstra
  31. Capacity Planning • Identify amount of resources an instance of

    a service needs • Identify the number of instances needed to meet a performance target @hveldstra
  32. Capacity Planning • Identify amount of resources an instance of

    a service needs • Identify the number of instances needed to meet a performance target • Identify whether current capacity is sufficient or some limits will need to be raised pre-prod @hveldstra
  33. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual @hveldstra
  34. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual • Stress test a service to find its limits & understand how it degrades — pre-prod, manual @hveldstra
  35. What types of performance tests are there? • Load test

    in CI/CD to continuously verify SLOs (service or composite API) — pre-prod • Load testing to help capacity planning — pre-prod, manual • Load test a service to better understand its scaling properties and tune configs — pre-prod, manual • Stress test a service to find its limits & understand how it degrades — pre-prod, manual • In prod, manual or automatic — add extra traffic as a safety margin; a form of chaos testing @hveldstra
  36. What types of performance tests are there? • Soak test

    - a load test that runs for a longer period of time (1-2 hours)
  37. What types of performance tests are there? • Soak test

    - a load test that runs for a longer period of time (1-2 hours) • Spike test - a load test that ramps up load very quickly
  38. Acceptance/functional Testing • Verify that a service (or another unit!)

    conforms to its contract • Verify that the results it produces make sense @hveldstra
  39. Auth service • Produces JWTs and not just empty 2×x

    responses • The token contains expected fields @hveldstra
  40. • It’s possible to re-use the same test code for

    both acceptance tests and load tests if your tooling supports it @hveldstra
  41. • It’s possible to re-use the same test code for

    both acceptance tests and load tests if your tooling supports it • (Artillery does!) @hveldstra
  42. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run @hveldstra
  43. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run • Is anything obviously broken? If we plug this thing in and turn it on, is there any smoke? @hveldstra
  44. Smoke tests • Main characteristic is their speed • A

    test that determines whether other tests should even run • Is anything obviously broken? If we plug this thing in and turn it on, is there any smoke? • Run a quick happy-path test case @hveldstra
  45. Early “write code” stage: help profile code or dependencies The

    rest typically is on a deployed service or API @hveldstra
  46. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. @hveldstra
  47. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. • This can act as a stage in other pipelines @hveldstra
  48. How do we run these in CI/CD? • Create a

    parameterized CI job that can run a load test against a $service in an $environment with a $load_profile and (optionally) verify $slos. • This can act as a stage in other pipelines • … or be used manually by a dev/tester for ad-hoc testing via a web UI or a CLI (e.g. on Jenkins or AWS CodeBuild) @hveldstra
  49. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself @hveldstra
  50. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure @hveldstra
  51. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions @hveldstra
  52. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions • Critical for testing internal microservices @hveldstra
  53. How do we run these in CI/CD? • The tests

    don’t run on the CI server itself • Best to be able to run them on your own (cloud) infrastructure • Flexibility when it comes to VPCs or regions • Critical for testing internal microservices • More cost-effective too @hveldstra
  54. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach @hveldstra
  55. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services @hveldstra
  56. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services • Run on a schedule (e.g. nightly) @hveldstra
  57. How do we run these in CI/CD? • What about

    test frequency? • No one-size-fits-all approach • Possible to test every change for microservices on the critical path, but probably excessive for most services • Run on a schedule (e.g. nightly) • Run before a final promotion of a change to prod @hveldstra
  58. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist @hveldstra
  59. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Use & adapt: https://github.com/SkeltonThatcher/run- book-template @hveldstra
  60. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) @hveldstra
  61. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) • Involve other teams that may rely on the service! @hveldstra
  62. Defining SLOs • Setting SLOs should be part of your

    team’s “creating a new microservice” checklist • Documented alongside API specs, design docs (typically a Confluence/wiki page that follows a template) • Involve other teams that may rely on the service! • Better to have some SLOs (& revise) than none at all @hveldstra
  63. Which services should we test? • What could be tested?

    Anything with an API spec. @hveldstra
  64. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services @hveldstra
  65. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services • Composite APIs (that’s a “unit” with its own properties & behavior) @hveldstra
  66. Which services should we test? • What could be tested?

    Anything with an API spec. • Individual services • Composite APIs (that’s a “unit” with its own properties & behavior) • Does a microservice have SLOs? Then it should have a performance test to verify those automatically. @hveldstra
  67. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance @hveldstra
  68. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance • Remove barriers: @hveldstra
  69. How do we encourage collaboration? • No magical solutions •

    Involve all functions in discussions about performance • Remove barriers: • Use tools that everyone has access to • Use tools that everyone can work with @hveldstra
  70. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source @hveldstra
  71. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with @hveldstra
  72. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with • Uses a language that everyone is familiar with @hveldstra
  73. How do we encourage collaboration? • Monorepos help • Good

    tooling helps • Available to everyone, ideally open source • Easy to install and get started with • Uses a language that everyone is familiar with • Make load testing reports & findings available to everyone (e.g. on Confluence/your wiki or KB) @hveldstra
  74. Artillery 101 • Available on npm: npm install -g artillery

    • The artillery CLI is used to run tests and create HTML reports @hveldstra
  75. Artillery 101 • Available on npm: npm install -g artillery

    • The artillery CLI is used to run tests and create HTML reports • Tests are written in YAML and can be extended with Javascript @hveldstra
  76. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc @hveldstra
  77. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc • Supports plugins. Out of the box: Statsd/Datadog/ Librato integration. @hveldstra
  78. Artillery 101 • Supports HTTP, Socketio, WebSocket, Kinesis, HLS out

    of the box. • Third-party plugins for SQS, Lambda, SQL etc • Supports plugins. Out of the box: Statsd/Datadog/ Librato integration. • Third-party plugins for other monitoring systems @hveldstra
  79. Artillery 101 • Designed to allow for complex, multi-step virtual

    user behavior to be scripted • Support for randomizing requests and capturing data from responses and re-using it in other requests @hveldstra
  80. Artillery 101 • Designed to allow for complex, multi-step virtual

    user behavior to be scripted • Support for randomizing requests and capturing data from responses and re-using it in other requests • Supports assertions on metrics such as HTTP latency — ie automated checking of SLOs @hveldstra
  81. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline @hveldstra
  82. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline • Can generate self-contained HTML reports with charts and graphs @hveldstra
  83. Artillery 101 • Runs well in Docker, easy to run

    on ECS or Kubernetes or in a CI/CD pipeline • Can generate self-contained HTML reports with charts and graphs • Features for creating modular test suites @hveldstra
  84. Config config: target: "" # we don't set a target

    by default environments: dev: target: "https://auth-service-dev.acme-corp.internal" defaults: headers: x-api-key: "0xcoffee" local: target: "http://localhost:8080" processor: “./functions.js" plugins: datadog: {} payload: - path: "./username-password.csv" fields: - username - password @hveldstra
  85. One or more scenarios scenarios: - name: Authenticate with valid

    credentials flow: - post: url: "/auth" json: username: "{{ username }}" password: "{{ password }}" expect: - statusCode: 200 - contentType: json @hveldstra
  86. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain @hveldstra
  87. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams @hveldstra
  88. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams • Helps code reuse @hveldstra
  89. Organizing our test suite • Using a monorepo • Easier

    to get started with, extend & maintain • Easier to share across teams • Helps code reuse • Easier to work with in CI/CD pipelines @hveldstra
  90. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  91. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  92. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  93. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  94. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  95. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  96. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one @hveldstra
  97. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs @hveldstra
  98. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs • allows for service-specific custom code, e.g. to generate random data in a certain format @hveldstra
  99. Why that structure? • Extensible: • can add a new

    service or API easily, or add a new scenario to an existing one • allows for service-specific config such as environment URLs or data from external CSVs • allows for service-specific custom code, e.g. to generate random data in a certain format • Can encode service-specific load phases and SLOs @hveldstra
  100. { "config": { "phases": [ { "duration": 120, "arrivalRate": 10,

    "rampTo": 20, "name": "Warm up the service" }, { "duration": 240, "arrivalRate": 20, "rampTo": 100, "name": "Ramp to high load" }, { "duration": 600, "arrivalRate": 100, "name": "Sustained high load" } ], "ensure": { "maxErrorRate": 0.1, "p99": 200 } } } @hveldstra
  101. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml @hveldstra
  102. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml @hveldstra
  103. Running a test artillery run \ —config ./services/auth-service/config.yaml \ --overrides

    "$(cat ./services/auth-service/ overrides.slos.json)” \ --e dev ./services/auth-service/login.yaml Service name, load/SLO override, environment and optionally a scenario → generic CI job @hveldstra
  104. Running a test • Reusable by other jobs / pipeline

    stages • Or via the UI for ad hoc testing - e.g. in Jenkins or AWS CodeBuild @hveldstra
  105. Organizing our test suite acme-corp-api-tests/ - services/ - auth-service/ -

    scenarios/ - config.yaml - functions.js - overrides.slo-response-time.json - package.json - common-config.yaml @hveldstra
  106. Where to start? • Pick one service • Write tests

    using the template & set up a CI job to run them @hveldstra
  107. Where to start? • Pick one service • Write tests

    using the template & set up a CI job to run them • Show & tell to the rest of the team @hveldstra
  108. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or @hveldstra
  109. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or @hveldstra
  110. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or • Has high performance requirements @hveldstra
  111. Where to start? • A good candidate service: • Small

    API surface • Has experienced performance issues, or • On the critical path for other components, or • Has high performance requirements • For example: an authentication service @hveldstra
  112. So… we’ve looked at • What performance testing is, and

    different types of performance tests @hveldstra
  113. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process @hveldstra
  114. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines @hveldstra
  115. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines • Setting and verifying SLOs @hveldstra
  116. So… we’ve looked at • What performance testing is, and

    different types of performance tests • Where performance testing fits into the development, testing, and delivery process • Running performance tests in CI/CD pipelines • Setting and verifying SLOs • The mechanics of setting up a test suite with Artillery @hveldstra