Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Going to Eleven

Going to Eleven

100,000 simultaneous users. 38 days. 9 systems. One launch. Ready?

In this retelling of a real-life product "land rush", you'll learn about web load testing of a site that went from zero users to all the users in one day. This talk will cover open source tools, the AWS cloud, dependency isolation, background jobs, and profiling. Hear about some hard lessons learned while negotiating with external partners, OS tuning, and avoiding being targeted as a botnet. Walk away with some idea how to get started load testing and how to deal with problems in the architectures as they come up.

6b6afbaea3bf1de98975dedc5cd083c1?s=128

stevenjackson

March 08, 2017
Tweet

More Decks by stevenjackson

Other Decks in Technology

Transcript

  1. Going to Eleven @stevejxsn

  2. Or… A Story in 5 Acts

  3. Or… A Story in 5 Acts

  4. I’m @stevejxsn steve@testdouble.com

  5. Goals • How to start? • Why Locust? • How

    to deal with problems in the architecture? • Make an argument for starting load testing early • Tell a compelling story
  6. • .realtor starts - Feb 2014 • Load Test start

    - 15 Sep 2014 • Soft Launch - 20 Oct 2014 • Launch - 23 Oct 2014 Timeline
  7. Dramatis Personae

  8. 8 Act 1 Baby Steps 3

  9. System Architecture Send Emails DNS Mail Forwarding Payment Processor CREA

    Membership NAR Membership Registrar Hosted Website
  10. User Funnel

  11. NAR Membership Send Emails DNS Mail Forwarding Payment Processor CREA

    Membership Registrar Hosted Website Prioritizing Dependencies
  12. Fake Member Service
 Sinatra Fake DNS
 Sinatra Fake Registrar
 EventMachine

    Fake Payment
 Sinatra Stubbing Dependencies
  13. Picking a Test Tool

  14. Why Locust? • Could interact with Rails CSRF tokens •

    Could execute end-to-end user interaction with sessions and cookies • Expands to multiple slaves to increase load capacity • Allows for distributed user actions based on percentages
  15. None
  16. None
  17. None
  18. None
  19. None
  20. Locust Infrastructure Each instance could easily support 12 slave processes

  21. Methodology • Start Small (1000 simultaneous users) • Gather data

    • Grow infrastructure as we hit bottlenecks
  22. Indexes Fake Data YSlow Easy Wins

  23. First External Test 95% Fail Rate 1000 users

  24. What did I learn?

  25. 8 Act 2 Go Big or Go Home 2

  26. None
  27. None
  28. 100,000 200,000

  29. What was the breaking point of our database!?

  30. 32 CPU 244GB RAM Provisioned IOPS 200GB Multi-AZ db.r3.8xlarge

  31. Web Servers

  32. r3.8xlarge 32 CPU 104 ECU 244GB RAM

  33. https://support.cloud.engineyard.com/hc/en-us/articles/205407758-Worker-Allocation-on- Engine-Yard-Cloud

  34. • Compression = YES, please • Serve static assets directly

    • Reverse proxy (proxy_pass) the rest to unicorn
  35. worker_processes 32 worker_rlimit_nofile 65536 use epoll worker_connections 65536 upstream realtor_unicorn

    { server unix:/tmp/realtor.sock fail_timeout=0; } keepalive_requests 0 proxy_read_timeout proxy_write_timeout client_max_body_size nginx.conf sites-available/sitename Take a look at these
  36. config/unicorn.rb config/timeout.rb Started at 400

  37. rails_user hard nofile 65536 /etc/security/limits.conf # Increase size of file

    handles and inode cache fs.file-max = 100000 /etc/sysctl.conf ulimits
  38. Logging and Metrics CPU
 
 Memory??!? CPU Memory Swap Latency

    Queue Depth Connections
  39. What did I learn? • Don’t wait so long to

    start load testing • The conversations drive new requirements • This stuff is hard to figure out under pressure • Too late to do big changes confidently
  40. 1 Act 3 The Best Laid Plans… 2

  41. None
  42. None
  43. None
  44. None
  45. PGBouncer

  46. None
  47. None
  48. None
  49. Virginia

  50. Virginia Oregon

  51. Virginia Oregon Atlanta Toronto Singapore Ireland California São Paulo

  52. @kumichou

  53. Bash Wizard @kumichou

  54. 120,000

  55. /etc/sysctl.conf

  56. 5 1

  57. None
  58. None
  59. System Architecture Send Emails DNS Mail Forwarding Payment Processor CREA

    Membership NAR Membership Registrar Hosted Website
  60. 8 0

  61. 7 0

  62. 6 0

  63. What did I learn?

  64. 3 Act 4 Game Time 0

  65. • Pre-warm ELB • On-demand limits • Starting new instances

    doesn’t always work
  66. • Validate assumptions • Quick fixes • HACKS!

  67. None
  68. None
  69. None
  70. None
  71. None
  72. 0 Act 5 Retrospective 0

  73. None
  74. None
  75. None
  76. None
  77. Script is not reality Testing is Expensive Analysis is complicated

  78. None
  79. “How do I convince others to start earlier?” “How to

    do this cheaper?” “When should we start?”
  80. I’m @stevejxsn steve@testdouble.com Thanks!