OverloadImpact - MadMaxing LoadIpmact through its emancipating API

Overload Impact madmaxing loadimpact through its emancipating API Pål de
Vibe, Schibsted

OverloadImpact CLI tool written in Python over loadimpact. com's SDK/API
Authors: Pål de Vibe (@paaldevibe) Pedro Barbosa Schibsted Products and Technology: http://www. schibsted.com Peter and Paul, by El Greco

Hacking something beyond … ◉ recognition ◉ permissions ◉ original
intentions … to inflict maximum damage madmaxing

◉ Schibsted is a global classifieds and media company ◉
120 businesses in 40 markets, enormous diverse problem space, exciting and unique challenges ◉ Disruptive startup rhythm, financial backing, global scale ◉ Huge societal impact ◉ Scandinavian values, integrity ◉ World class colleagues Schibsted Products and Technology

◉ Cross site login, identity and payment service ◉ Classifieds
(finn, blocket), newspapers (VG, Aftonbladet) ◉ Analytics, needed to compete in all markets SPID

◉ Business case: SPID rollout to more countries ◉ login
and identity (and payment) by Schibsted ◉ Building OverloadImpact (oimp) to take better advantage of loadimpact.com (LI) ◉ what needs did oimp satisfy? ◉ what does oimp provide? ◉ Some findings from loadtesting AWS with LI Load testing SPID with loadimpact.com, building OverloadImpact

> 6 million registered users in SE and NO Classified
media (Finn, Blocket), newspapers (Aftonbladet, Dagbladet, VG etc) Schibsted worldwide 200 million users Currently engaging more than 200 million users wordlwide. SPID Rollout How do we want to receive that traffic? scale

You are like precious gold that is laid on a
hard anvil and hammered, for you were hammered with every kind of tribulation Saint Bridget (Birgitta) of Vadstena

◉ Automated, on-demand performance/load testing ◉ Load test your web,
mobile or API with up to 1.2 million concurrent users ◉ Tests written in Lua ◉ API: run, manipulate tests, get results ◉ Python and Java SDK loadimpact.com

Load testing SPID: why loadimpact.com? ◉ LI already used for
a few limited cases in SPID testing ◉ Evaluated gatling, locust.io, jmeter ◉ Why loadimpact? ◉ Not build test client infrastructure scaling ourselves ◉ Lua => performance awareness => scalability ◉ A lot of stuff done (stats, scaling, reporting) ◉ If you're selling surfboards, don't reinvent the sea. You might even drown in the process...

◉ node.js javascript session service target: 20k req/s ◉ PHP
web for login, target: 700 req/s ◉ PHP API, target: 1000 req/s ◉ Shared session database between node and PHP so main uses cases should be tested together ◉ Find weaknesses, bugs and iterate on architecture and implementation details Services and requests

◉ Loadimpact has great tools ◉ for recording a scenario
(≈use case) ◉ statistics ◉ traffic scheduling and scaling ◉ monitoring tests ◉ custom metrics ◉ but when you start coding several similar test cases... Why we started using the API

◉ Scenario: Test case, e.g. web login, User API request
◉ TestConfig: Set of scenarios ◉ Schedule for traffic scale up and peak time ◉ Load zones (Virginia, Ireland etc) ◉ Data Store: data used by tests, e.g. emails/passwords ◉ Virtual Users: Emulates a user ◉ Can run simultaneous web requests like a browser would when visiting a page LoadImpact.com core concepts

◉ scenarios use data stores for sets of test data
◉ We couldn't update data stores (worse in multiple scenarios) ◉ response: upload scenarios w/custom script using API/SDK: ◉ datastore.open('foo.com-users-DS_VERSION') ◉ query API for data store names and find latest match ▪ replace in code before updating scenario code thru API ◉ upload new data store versions with CLI script ▪ connected by naming convention: 'foo.com-users- VER_20150813121212' ◉ command line tool oimp arose incrementally Why we started using the API: data stores repetitivity

$ oimp # no args USAGE: oimp setup_project [NAME] [DEST_DIR]
oimp sequence [NAME] [RUN_DESCRIPTION] oimp program [NAME] [RUN_DESCRIPTION] oimp test_config [NAME] oimp scenario [ACTION] [NAME] oimp report program [ACTION] [PROGRAM_RUN_ID] oimp report test_config [ACTION] [RUN_ID] [TITLE] oimp target oimp api_method [NAME] [ARGS ...] oimp help oimp (OverloadImpact) Open source python CLI tool: https://github.com/schibsted/overloadimpact created by Pål de Vibe and Pedro Barbosa at Schibsted Products and Technology Main features will now be explained

scenario update $ oimp scenario update web-login Updating scenario: web-login
==> Building scenario web-login Using datastore foo.com-users.VER_20150731100338 Updated test scenario 3071922 Starting validation #3128508... 0 [...16:38:01...]: https://foo.com/... returned a 302 (Found) response 1 [...16:38:01...]: https://foo.com/flow/login/...?login=1 returned a 302 (Found) response Validation completed with status 'finished' Wrote validation results to /tmp/.../validation.3128508.json CLI tool oimp example: update scenario web-login All editing, validation and execution done on local machine instead of LI web!

◉ no support for code reuse between scenarios ◉ response:
composer allows including custom libraries: ◉ --- import foo/foo ◉ Libs prepended to scenario code. pushed to LI thru API. ◉ Custom util functions, logging, debugging, metrics ◉ BIG WIN: very fast and reliable API ◉ API nomenclature could have been more precise (naming of testing tools can be tricky) no code reuse across scenarios => composer with lib support

scenario web-login.lua -- general oimp setup prepended here by oimp
composer --- import common --- import flows/login local res = flow_login(email, password, client_id) if oimp.fail(oimp.TOP_PAGE, 'login', res, nil) then oimp.done(0); return end -- on success tear down and custom metrics appended by oimp composer

from lib flows/login.lua function login_credentials(email, password...) ... -- oimp.request() does
custom metrics, logging, xhprof local res = oimp.request(page, { 'POST', uri, headers = headers, data = post, auto_redirect = false }, IS_LOGIN_SCENARIO) -- oimp.check_status() does custom metrics, logging if not oimp.check_status(page, res, status_code) then return end ...

update process overview prepend libs, setup (standard, custom) append tear
down replace data-stores placeholders with latest versions update composer lua file through API, including data-store references

◉ When running certain sets of scenarios in the same
test_config the tests "segfaulted" instead of reporting errors ◉ response: programs (1 programs has multiple test_configs) ◉ Simultaneously fire sets of test_configs ◉ separate test_configs to work around loadimpact scaling issues ◉ allows conceptual grouping of scenarios test_config segfaults => programs with region spread

oimp programs 1 program*N test_configs. 1 test_config*N scenarios. Example program:
sessions-and-web.yaml configs: web-flows: users: 2500 warmup: 20 stable: 10 scenarios: web-login: region: "amazon:us:ashburn" percent-of-users: 70 web-login-rememberme: region: "amazon:us:portland" percent-of-users: 30

program execution example execute program sessions-and-web.yaml $ oimp program sessions-and-web
"40 web fronts" … ==> Updated config sessions Duration: 20 Users: 2500 Duration: 10 Users: 2500 … ==> Updated config web-flows Duration: 20 Users: 1000 Duration: 10 Users: 1000 ==> Started test sessions ==> Started test web-flows $

loadimpact.com has great charts, but every time you run a
test_config and have to manually set up the reports, you are like... reports and charts

also… we needed some custom metrics • custom action fail/success
rate • sub action duration • peak load period averages • custom metrics API is awesome reports and charts

◉ Response: python charts from API metrics ◉ We extract
custom and regular metrics from the API ◉ Charts are generated with pygal. Looks great … .ehh ehh… ◉ Comparative charts between runs with targets ◉ Compare different server setups, implementations, DBs, etc. Targets. ◉ Headers and footers are automatically added to scenarios to generate a lot of the custom metrics custom reusable stats and charts => python charts from API metrics

Comparison of peak period averages req/s for two different DB
setups and target traffic.

What could be the issue here?

custom has-session metrics

What does oimp provide today? ◉ Open sourced https://github.com/schibsted/overloadimpact ◉
Library support ◉ Automatic report generation ◉ Single command meta-suites ◉ Target evaluation ◉ IDE liberation ◉ Faster code-run-fix roundtrip ◉ Xhprof support

Some findings ◉ Thanks LI. Great API, great support. API
saved us! ◉ We combined LI with apache ab for higher volumes on specific requests ◉ Race conditions found and fixed due to load testing ◉ AWS great at duplicating cloud setups for testing ◉ Realistic user data generator and obfuscator took time ◉ Realistic set of scenarios took time also, especially cookies ◉ DB type and setup was more important for throughput than app code/lang, since app can scale horizontally

How to get started ◉ Get a loadimpact.com license ◉
Install oimp, instructions here: https://github. com/schibsted/overloadimpact ◉ Madmax oimp even further

We need help with... ◉ Improving inexpert lua and python
code ◉ Wrap scenario code in functions to simplify setup and tear down (e.g. reporting error state, timing etc) ◉ Enable local execution of tests with luasocket http lib (?). Would magnify efficiency of debugging scenarios. ◉ Make charts less ugly and more useful ◉ Write tests for oimp itself ◉ Take advantage of LoadImpact API v3 ◉ Utility libs ◉ Pull metrics from datadog or similar

@ Any questions? Pål de Vibe @paaldevibe / @menneskemaskin /
paal.de.vibe__schibsted. com Schibsted Products and Tech: http://www.schibsted.com Presentation template by SlidesCarnival THANKS!

OverloadImpact - MadMaxing LoadIpmact through i...

OverloadImpact - MadMaxing LoadIpmact through its emancipating API

Other Decks in Programming

Featured

Transcript