Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Locust Storm Is A Good Thing!

A Locust Storm Is A Good Thing!

Given at Stir Trek 2016

Have you ever wondered what is involved in doing load testing of web applications? What if you had to support 100,000 simultaneous users on Day 0? Do you have to integrate with 3rd party services? How do you strategically isolate and test those dependencies? What do you do when your tests reveal performance problems? Join us in a talk about web load testing using open source tools, the AWS cloud, dependency isolation, DB profiling, OS tuning, and not being targeted as a botnet.

6b6afbaea3bf1de98975dedc5cd083c1?s=128

stevenjackson

May 06, 2016
Tweet

More Decks by stevenjackson

Other Decks in Programming

Transcript

  1. A Locust Swarm Can Be a Good Thing! Steve Jackson

    @stevejxsn steve.jackson@leandog.com
  2. A Favor?

  3. A Favor? • Something you would change • Something you

    would NOT change • Something you found surprising • Something you found boring
  4. None
  5. None
  6. None
  7. A Locust Swarm Can Be a Good Thing! Steve Jackson

    @stevejxsn steve.jackson@leandog.com
  8. A Story in 5 Acts Or…

  9. A Story in 5 Acts Or…

  10. • How to start? • Why Locust? • How to

    deal with problems in the architecture? • Make an argument for starting load testing early • Tell a compelling story Goals
  11. • .realtor starts - Feb 2014 • Load Test start

    - 15 Sep 2014 • Soft Launch - 20 Oct 2014 • Launch - 23 Oct 2014 Timeline
  12. Dramatis Personae

  13. Act 1 Baby Steps 38

  14. Send Emails DNS Mail Forwarding Payment Processor CREA Membership NAR

    Membership Registrar Hosted Website System Architecture
  15. User Funnel

  16. NAR Membership System System Send Emails DNS Mail Forwarding Payment

    Processor CREA Membership Registrar Hosted Website Prioritizing Dependencies
  17. Fake Member Service
 Sinatra System System Fake DNS
 Sinatra Fake

    Registrar
 EventMachine Fake Payment
 Sinatra Stubbing Dependencies
  18. Picking a Test Tool

  19. Why Locust? • Could interact with Rails CSRF tokens •

    Could execute end-to-end user interaction with sessions and cookies • Expands to multiple slaves to increase load capacity • Allows for distributed user paths based on percentages
  20. Locust Script

  21. Locust Script

  22. Locust Script

  23. None
  24. None
  25. Locust Infrastructure Each instance could easily support 12 slave processes

  26. Methodology • Start Small (1000 simultaneous users) • Gather data

    • Grow infrastructure as we hit bottlenecks
  27. Easy Wins Indexes Fake Data YSlow

  28. First External Test 95% Fail Rate 1000 users

  29. What did I learn?

  30. Act 2 Go Big or Go Home 28

  31. None
  32. None
  33. 100,000 200,000

  34. What was the breaking point of our database!?

  35. 32 CPU 244GB RAM Provisioned IOPS 200GB Multi-AZ db.r3.8xlarge

  36. Web Servers

  37. r3.8xlarge 32 CPU 104 ECU 244GB RAM

  38. https://support.cloud.engineyard.com/hc/en-us/articles/205407758-Worker-Allocation-on- Engine-Yard-Cloud

  39. • Compression = YES, please • Serve static assets directly

    • Reverse proxy (proxy_pass) the rest to unicorn
  40. worker_processes 32 worker_rlimit_nofile 65536 use epoll worker_connections 65536 upstream realtor_unicorn

    { server unix:/tmp/realtor.sock fail_timeout=0; } keepalive_requests 0 proxy_read_timeout proxy_write_timeout client_max_body_size Take a look at these nginx.conf sites-available/sitename
  41. config/unicorn.rb config/timeout.rb Started at 400

  42. rails_user hard nofile 65536 /etc/security/limits.conf • ulimits to max #

    Increase size of file handles and inode cache fs.file-max = 100000 /etc/sysctl.conf
  43. CPU
 
 Memory??!? CPU Memory Swap Latency Queue Depth Connections

    Logging and Metrics
  44. None
  45. None
  46. What did I learn? • Don’t wait so long to

    start load testing • The conversations drive new requirements • This stuff is hard to figure out under pressure • Too late to do big changes confidently
  47. Act 3 The Best Laid Plans… 21

  48. None
  49. None
  50. None
  51. None
  52. PGBouncer

  53. None
  54. None
  55. None
  56. Virginia

  57. Virginia Oregon

  58. Virginia Oregon Atlanta Toronto Singapore Ireland California São Paulo

  59. None
  60. Bash Wizard

  61. 120,000

  62. /etc/sysctl.conf

  63. 15

  64. None
  65. None
  66. None
  67. Send Emails DNS Mail Forwarding Payment Processor CREA Membership NAR

    Membership Registrar Hosted Website System Architecture
  68. 08

  69. 07

  70. 06

  71. What did I learn?

  72. Act 4 Game Time 03

  73. • Pre-warm ELB • On-demand limits • Starting new instances

    doesn’t always work
  74. • Validate assumptions • Quick fixes • HACKS!

  75. None
  76. None
  77. None
  78. None
  79. None
  80. Act 5 Retrospective 00

  81. None
  82. None
  83. None
  84. None
  85. Script is not reality Testing is Expensive Analysis is complicated

  86. None
  87. “How do I convince others to start earlier?” “How to

    do this cheaper?” “When should we start?”
  88. Thanks! @stevejxsn steve.jackson@leandog.com