EHLO webengdus
• Sebastian Cohnen (@tisba)
• 7+ years consulting & development
• focus on performance and
architecture
• founder & CTO StormForger.com
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
I …
Slide 6
Slide 6 text
Performance Testing
Slide 7
Slide 7 text
… from 10 Users to …
Slide 8
Slide 8 text
Load Testing with
1,000,000 Users
Sebastian Cohnen, @tisba
stormforger.com, @StormForgerApp
at
March 2017
Slide 9
Slide 9 text
The year was 2014…
• Still no global IPv6 rollout
• But we finally have .technology, .domains, .xyz and .guru TLDs
• TV Shows are getting interactive
Slide 10
Slide 10 text
Quizduell
• alias "QuizClash", "QuizReto", …
• Mobile Quiz Game/App
• >30M players worldwide
• >14M in Germany
Slide 11
Slide 11 text
Let’s make an interactive
TV show out of it!
Slide 12
Slide 12 text
"Quizduell im Ersten"
VS
Slide 13
Slide 13 text
• In-App Web View using AngularJS
• HTTP & JSON API written in Go
• Hosted on Google App Engine
• Build by
Behind the Scenes
Slide 14
Slide 14 text
• May, 12th 2014
• ~1.6M viewers
• 200,000 pre-registered players
Show Premiere
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Hacker Cyber Attack!
Slide 19
Slide 19 text
Not quite…
Slide 20
Slide 20 text
Nothing worked…
• The very first round of “Team Germany” failed
• Service overwhelmed, slow,
unresponsive…
• Bad press; much speculation
about hackers, [D]DoS etc.
Slide 21
Slide 21 text
Disaster Recovery
Slide 22
Slide 22 text
I was called
Slide 23
Slide 23 text
During the next days…
• large scale load testing to provide insights
• lots of configuration testing!
• modeled game agents using StormForger to play the show
• profiling, debugging, refactoring, …
Slide 24
Slide 24 text
Load Gen Setup
• 50 Load Generators (AWS EC2 Ireland)
• 800 cores, 1.5 TB RAM, lot’s of bandwidth
• 3.3 TB data moved in over 2B requests
• 1M Active Users, 330k rps peak
Slide 25
Slide 25 text
The Challenge?
Slide 26
Slide 26 text
TV Synchronicity
• This is what makes the show “interactive"
• API polling every 1-10 sec
• “server-side DDoS orchestration” (synchronized state polling & you have
to answer questions within 15 sec)
Slide 27
Slide 27 text
Y U NO
LOAD TEST BEFORE!?
Slide 28
Slide 28 text
pre launch load tests:
up to 85k rps (~250k Users)
Slide 29
Slide 29 text
New load tests:
Up to 330k rps, ~1M Users
pre-launch load tests:
Up to 85k rps (~250k Users)
Slide 30
Slide 30 text
*even Google
Remember…
⚠ ⚠
Call* before load testing
with 1,000,000 Users!
Slide 31
Slide 31 text
May 21st 2014:
First try with App…
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
Issues
• Google DoS Protection
• Understand Google App Engine’s tuning & scaling knobs
• Runtime environment on App Engine is not transparent
Slide 34
Slide 34 text
The Actual Problem
• Customer insisted on last minute changes to the backend, mostly real-time
statistic related
• no time to load test again prior to show premiere
Slide 35
Slide 35 text
Perf Test early, Perf Test often!
Can you do
continuous load testing?