Performance Testing Serverless

Performance Testing Serverless

I gave this talk at the AWS User Group in Cologne in April 2019 (https://www.meetup.com/aws-cologne/events/260502656/).

It is about performance testing with a outlook on implications with Serverless systems. I'll go over some challenges you can expect with such systems and try to outline strategies how to approach this topic in general. In addition I'm taking about a common performance pitfall when using Serverless with Node.js as well.

19f1245e673133f1f5b36a1a658f8c1d?s=128

Sebastian Cohnen

April 25, 2019
Tweet

Transcript

  1. 7.

    @tisba ! AWS User Group CGN • Sebastian Cohnen (@tisba)

    • 9+ years consulting & development • focus on performance and architecture • founder & CTO StormForger
  2. 8.

    StormForger Start Perf Testing Now! StormForger.com • Performance Testing SaaS

    for DevOps Teams • Fully managed, focused on integration and continuous performance testing
  3. 11.

    @tisba Serverless? * … a gross simplification Application State /

    Data Runtime OS Serverless Application State / Data Runtime OS Containers Application State / Data Runtime OS … VM Networking Networking Networking … … … Application State / Data Runtime OS Bare Metal Networking Hardware … … … … … …
  4. 12.

    @tisba Serverless • …might have an impact on how you

    build your systems • …systems will still be quite complex and distributed • …might be used for (µ-)Services
  5. 13.
  6. 14.
  7. 15.

    @tisba Performance Testing Serverless! • Learning about system behaviour •

    Correct sizing & cost optimisations
 • Stateless • Issues with observability
  8. 16.

    @tisba Strategies • Get started! Pragmatism over Perfection • Divide

    and Conquer • Two perspectives: “End-to-End” & “per Unit” • Move downward in the stack
  9. 17.

    @tisba End-to-End • Scenarios modelled around actual usage • Perspective

    is a perimeter view • Can generate quite some noise when problems occur • Usually answers business questions best Unit • Test components or units in isolation • Very useful for teams to debug & troubleshoot performance issues • Good for checking technical SLO • Very hard to model in a way that is representative
  10. 21.

    @tisba API GW Lambda CloudFront API GW Lambda Test Traffic

    API GW Lambda DynamoDB Moving down the Stack
  11. 22.

    @tisba API GW Lambda API GW Lambda Test Traffic API

    GW Lambda CloudFront Moving down the Stack DynamoDB ElastiCache
  12. 24.

    @tisba Can you quickly figure out the reason for
 an

    increased latency or error rate for a specific endpoint?
  13. 25.
  14. 26.

    @tisba Observability • Logs • Metrics • Tracing
 • Check

    out theburningmink.com, e.g.
 https://theburningmonk.com/2018/04/ serverless-observability-what-can-you- use-out-of-the-box/
  15. 29.

    @tisba HTTP Keep-Alive • Using network connections for multiple requests


    • otherwise extremely wasteful in terms of resources • Keep-Alive significantly decreases latency and resource utilisation connection: ~54%
  16. 31.

    @tisba HTTP Keep-Alive • Using network connections for multiple requests


    • otherwise extremely wasteful in terms of resources • Keep-Alive significantly decreases latency and resource utilisation • e.g. Node.js does not keep alive HTTP client connections by default ⚠ connection: ~54%
  17. 32.

    @tisba HTTP Keep-Alive • AWS Lambda are stateless • State

    is being externalised, often times over HTTP • Our upstream services are typically using HTTP • …and almost all AWS Services are talked to over HTTP as well!
  18. 33.

    @tisba Remember Observability? • Networking and OS-level operations are hard

    to observe! • Instrumenting Network operations is next to impossible (AFAIK) • ephemeral port exhaustion, socket statistics, … Application State / Data Runtime OS … Serverless Networking … { ?
  19. 35.

    @tisba definition.session("keep-alive", function(session) { session.times(25, function(context) { // HTTP 1.1

    with HTTP Keep-Alive is the default context.get("http://testapp.loadtest.party/", { tag: "keep-alive" }); context.waitExp(0.5); }); }); 
 definition.session("no-keep-alive", function(session) { session.times(25, function(context) { context.get("https://testapp.loadtest.party/", { tag: "no-keep-alive", // instruct client to close connection after request is done headers: { Connection: "close", }, }); context.waitExp(0.5); }); });
  20. 38.

    @tisba time (ms) 0 300 600 900 1.200 median p99.9

    max 1.158 242 122 262 85 29 HTTP Keep-Alive Impact* -76% -65% * Traffic generated from two instances to a single target, over 5 minutes, 25 requests per connection (keep-alive), ~200k requests in total, ~337 TCP&TLS handshakes/sec, avg test cluster CPU utilisation ~50%. -77%
  21. 42.

    @tisba Private VPCs • You can use AWS Lambda that

    are running in your private VPCs • Keep in mind that they allocate IP addresses from the VPC they are running in • If you are running out of address space (hello IPv4), weird things will happen!
  22. 43.

    @tisba Cascading Failures & Timeouts • Know your timeouts for

    all layers • Circuit breakers, exponential back off • Ideally: Deadline & Cancellation Propagation https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/