Load Testing with 1M Users

Load Testing with 1M Users

I gave this talk at several occasions (meet ups, conferences, …). There are some blog posts on that topic as well over at https://stormforger.com/blog/2014/05/27/load-testing-an-interactive-tv-show-with-over-1-million-users. Checkout https://stormforger.com if you are interested in doing load testing yourself.

Abstract:

This is a battle story talk about how we helped to debug and load test the interactive German TV quiz show on ARD back in 2014. The show failed during the first 5 minutes on the first episode. The talk will outline some background info, architecture and what happened, why it failed and what we did in order to help fixing it in one week. In the end the broadcaster (ARD) demanded external load tests to verify that the optimized/fixed system could sustain 1M concurrently active players…

19f1245e673133f1f5b36a1a658f8c1d?s=128

Sebastian Cohnen

July 01, 2019
Tweet

Transcript

  1. 3.

    @tisba EHLO $MEETUP • Sebastian Cohnen (@tisba) • 9+ years

    consulting & development • focus on performance and architecture • founder & CTO StormForger.com
  2. 4.
  3. 9.

    @tisba The year is 2014… • Still no global IPv6

    rollout • But we finally have .technology, .domains, .xyz and .guru TLDs
 • TV Shows are getting interactive
  4. 10.

    @tisba Quizduell • alias "QuizClash", "QuizReto", … • Mobile Quiz

    Game/App • >30M players worldwide • >14M in Germany
  5. 13.

    @tisba • In-App Web View using AngularJS • HTTP &

    JSON API written in Golang • Hosted on Google App Engine Behind the Scenes
  6. 14.

    @tisba • May, 12th 2014 • ~1.6M viewers • 200,000

    pre-registered players Show Premiere
  7. 15.
  8. 16.
  9. 20.

    @tisba $ Nothing worked… % • The very first round

    of “Team Germany” failed • Service overwhelmed, slow,
 unresponsive… • Bad press; much speculation
 about hackers, [D]DoS etc.
  10. 23.

    @tisba During the next days… • modeled game agents using

    StormForger to play the show • large scale load testing to provide insights • lots of configuration testing! • profiling, debugging, refactoring, …
  11. 24.

    @tisba Load Gen Setup • 50 Load Generators (AWS EC2

    Ireland) • 800 cores, 1.5 TB RAM, lot’s of bandwidth • 3.3 TB data moved in over 2B requests • 1M Active Users, 330k rps peak Remember 2014?
 No eu-central-1 yet! #
  12. 26.

    @tisba TV Synchronicity • This is what makes the show

    “interactive" • API polling every 1-10 sec • “server-side DDoS orchestration” (synchronized state polling & you have to answer questions within 15 sec)
  13. 29.

    @tisba New load tests:
 Up to 330k rps, ~1M Users

    pre-launch load tests:
 Up to 85k rps (~250k Users)
  14. 32.
  15. 33.

    @tisba Issues • Google DoS Protection • Understand Google App

    Engine’s tuning & scaling knobs • Runtime environment on App Engine is not transparent
  16. 34.

    @tisba The Actual Problem • Customer insisted on last minute

    changes to the backend, mostly real-time statistic related • no time to load test again prior to show premiere #
  17. 36.