Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Load Testing with 1,000,000 Users!

Load Testing with 1,000,000 Users!

I gave this talk in March 2017 at the Web Engineering Düsseldorf Meetup (https://www.meetup.com/Web-Engineering-Duesseldorf/events/237783925/). Read more about the story at https://stormforger.com/blog/2014/05/27/load-testing-an-interactive-tv-show-with-over-1-million-users. Checkout https://stormforger.com!

Abstract:

This is a battle story talk about how we helped to debug and load test the interactive German TV quiz show on ARD back in 2014. The show failed during the first 5 minutes on the first episode. The talk will outline some background info, architecture and what happened, why it failed and what we did in order to help fixing it in one week. In the end the broadcaster (ARD) demanded external load tests to verify that the optimized/fixed system could sustain 1M concurrently active players…

Sebastian Cohnen

March 17, 2017
Tweet

More Decks by Sebastian Cohnen

Other Decks in Technology

Transcript

  1. EHLO webengdus • Sebastian Cohnen (@tisba) • 7+ years consulting

    & development • focus on performance and architecture • founder & CTO StormForger.com
  2. The year was 2014… • Still no global IPv6 rollout

    • But we finally have .technology, .domains, .xyz and .guru TLDs
 • TV Shows are getting interactive
  3. Quizduell • alias "QuizClash", "QuizReto", … • Mobile Quiz Game/App

    • >30M players worldwide • >14M in Germany
  4. • In-App Web View using AngularJS • HTTP & JSON

    API written in Go • Hosted on Google App Engine • Build by Behind the Scenes
  5. Nothing worked… • The very first round of “Team Germany”

    failed • Service overwhelmed, slow,
 unresponsive… • Bad press; much speculation
 about hackers, [D]DoS etc.
  6. During the next days… • large scale load testing to

    provide insights • lots of configuration testing! • modeled game agents using StormForger to play the show • profiling, debugging, refactoring, …
  7. Load Gen Setup • 50 Load Generators (AWS EC2 Ireland)

    • 800 cores, 1.5 TB RAM, lot’s of bandwidth • 3.3 TB data moved in over 2B requests • 1M Active Users, 330k rps peak
  8. TV Synchronicity • This is what makes the show “interactive"

    • API polling every 1-10 sec • “server-side DDoS orchestration” (synchronized state polling & you have to answer questions within 15 sec)
  9. New load tests:
 Up to 330k rps, ~1M Users pre-launch

    load tests:
 Up to 85k rps (~250k Users)
  10. Issues • Google DoS Protection • Understand Google App Engine’s

    tuning & scaling knobs • Runtime environment on App Engine is not transparent
  11. The Actual Problem • Customer insisted on last minute changes

    to the backend, mostly real-time statistic related • no time to load test again prior to show premiere