Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Life of a Load Generator

The Life of a Load Generator

In this presentation we talk about and show how StormForger uses AWS EC2 to perform load and performance tests. These tests scale transparently for our customers from very small scenarios to over 1 million requests per second. Sebastian and Stephan will talk about how they build our AMI automatically and how the life cycle of our load generators works with StormForger.

Sebastian Cohnen

September 16, 2020
Tweet

More Decks by Sebastian Cohnen

Other Decks in Technology

Transcript

  1. • Sebastian Cohnen (@tisba) • Founder & CTO StormForger •

    Stephan Zeissler (@zeisss) • Software Engineer StormForger !
  2. StormForger.com • Performance Testing as a Service (SaaS) • Founded

    in 2014, based in Cologne • Fully managed, scalable platform to run perf tests • Available from 17 AWS regions
  3. General Design Goals • Keep things simple: Monolithic application +

    Load Generators. • Always default to the safe option and terminate running tests. • EC2 VMs are ephemeral. • Load generators are dynamically configured and have very little prior knowledge of the world. • Things fail so retry where possible and use timeouts.
  4. Setup • Ruby on Rails based Command & Control application

    (aka “the forge”) • Background worker operations are designed to be idempotent and retry-able. • Golang based custom load generator • A load generator cluster consists of a controller and generators • Infrastructure is fully managed, running on EC2 on-demand instances
  5. Launch • Instances are managed by the forge via AWS

    Ruby SDK • AWS Region is selected by customer • AMI ID, instance type and engine version are determined • Instances are launched in batches • Launch typically takes ~20-30sec (from API call to ready)
  6. Launch (cont.) • cloud-init is used to make final provisioning

    steps: • Write environment specific configuration • Download our engine (pre-signed S3 URL) • Update internal X509 root certificate (CA) • Download AWS EC2 Instance Identity Documents • Explicitly wait for the clock to sync
  7. Authentication • Each instance initially authenticates itself to our API

    with… • AWS Instance Identity Documents (provided by the EC2 metadata API) • a Certificate Signing Request (CSR) • Identity of the instance is verified against AWS public certificates. • A short lived X509 certificate is generated and passed back to the instance for mutual authenticated communication.
  8. Running a Test • Assign available instances to the test

    • Deploy test configuration, required test data and start test • Monitor running test, pre-processing metrics • After completion, fetch logs and measurements, do post-processing and analysis • Release instance
  9. Termination • After a certain idle time, instances are terminated.

    • Terminating instances are tracked, until full termination is confirmed.
  10. Instance Pooling • Instances are not directly terminated after usage.

    • Minimize “Time to Test” by keeping recently used resources around. • At all times: Make sure that instances are used exclusively! • Instances have a limited upper life span (ephemeral design).
  11. Robustness • Heart beats • Test Runs: Push state changes

    to API & Poll state periodically • Retries & Timeouts (for requests & state machines) • Deadman Switches
  12. • Software issues…? • Network issues…? • VM issues…? •

    Terminate the EC2 instance " Break Glass Procedure
  13. AMI Provisioning $ • Packer for provisioning and to copy

    AMIs to all regions • Packer is using Ansible to… • remove unused configs and packages, disable unattended updates, remove AWS SSM Agent, etc. • apply network and kernel optimisations
  14. Other Tooling • Base AMI Update Checker • Orphaned Server

    Detection • Exception Tracking (Rollbar) • Integration Testing (automated tests for new AMI and engine releases)
  15. Future Work • Closing the gap to get fully automated

    AMI updates and assessments • Cost optimisation using EC2 Spot instances with defined durations • Support dedicated IPs for customers • Dedicated Customer Pools & self-hosted setups • Improve troubleshooting in case we have to auto terminate instances