Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making It Through the New Year's Sale, One of t...

Making It Through the New Year's Sale, One of the Biggest Events at ZOZOTOWN

Kaito Akita (ZOZO / Technology Division SRE Department ZOZO SRE Section / SRE)

https://tech-verse.me/ja/sessions/227
https://tech-verse.me/en/sessions/227
https://tech-verse.me/ko/sessions/227

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Self Introduction I joined ZOZO Technologies, Inc. (currently ZOZO, Inc.)

    as a new graduate in April 2020 I was assigned to the team which widely managed the networks, databases, web servers, etc. in operation in ZOZOTOWN and was in charge of hybrid construction and operation using on- premise and cloud services Currently, I am leading projects in the SRE domain of front-end replacement projects Technology Division SRE Department ZOZO SRE Section
  2. First load test in December 2020 I will talk about

    the load test for the New Year's sales in ZOZOTOWN Load test to be performed in December 2022 Load test in December 2021 implementing various improvements
  3. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  4. Agenda - Background and Objective of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  5. Background Currently, ZOZOTOWN is proceeding with a replacement project. Though

    we had performed unit load tests in each micro service, we had never performed a load test for the New Year's sale through the Web servers and API servers accessed by users until December 2020. The reason why we had not performed such a load test is that both Web servers and API servers were on-premise environments and the production environments had to be used for such a test.
  6. Background Since we had not performed a load test, the

    increase in access immediately after the start of the New Year's sale in January 1, 2020 made user access difficult.
  7. Background It is necessary to perform a load test to

    enable users to use ZOZOTOWN comfortably during the New Year's sale.
  8. Objective 1. Realize a load test under an environment equivalent

    to sale in accordance with user access line to identify and improve the possible bottlenecks in advance 2. Evacuation training for quick collaboration with parties concerned in the event of trouble
  9. Objective The load test enables engineers to manage the New

    Year's sales safely so that users can use ZOZOTOWN comfortably
  10. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  11. Characteristics - Perform a load test using the production environment

    - Perform a load test with low traffic late at night For on-premise environments, we repeated scale-up and scale-down along with the growth of ZOZOTOWN. Currently, it is too costly to prepare infrastructure environments equivalent to production environments for load tests.
  12. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  13. System Configuration January 1, 2020 IIS(Web) IIS(API) SQL Server SQL

    Server SQL Server iOS・Android Data Center Cloud DC extension using VMWare Cloud on AWS Browser Transfer the on- premise stored procedure to API SQL Server Read Only Goods Service Stored Procedure Stored Procedure Stored Procedure Replication
  14. Service Availability Service Availability Indicators Threshold ⎯ To understand the

    health of ZOZOTOWN from a medium- to long-term perspective ⎯ Service availability as an item to measure availability ⎯ Occupancy rate within a certain time (1 day) ⎯ (1 - Total number of Errors/Total number of Requests) × 100 ⎯ Service availability below 99.9% less than 3 days per month ⎯ Check the graphs that com up after this to see if they are below 99.9% every hour SLO Measurement
  15. Service Availability January 1, 2020 Web 94 94.5 95 95.5

    96 96.5 97 97.5 98 98.5 99 99.5 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  16. Service Availability January 1, 2020 App 91.5 92 92.5 93

    93.5 94 94.5 95 95.5 96 96.5 97 97.5 98 98.5 99 99.5 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  17. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  18. Organization Each team (6 teams) ⎯ Two persons: project lead

    and support ⎯ Planning of the schedule ⎯ Creation of scenarios ⎯ Determination of the target value ⎯ Execution of the load test ⎯ Infrastructure resource estimate ⎯ Infrastructure resource reinforcement ⎯ Resource monitoring on the test day ⎯ Cooperation for creation of scenarios Team in charge of the load test Breakdown of the teams ⎯ Backend (2 teams) ⎯ SRE (3 teams) ⎯ DB (1 team)
  19. Schedule November 2020 December 2020 ⎯ Later October Determination of

    the persons in charge ⎯ Consideration of the schedule and target value by the persons in charge of the load test ⎯ Kick-off with the parties concerned and request for cooperation ⎯ Preparation of the load test execution environment ⎯ Creation of scenarios ⎯ Correction of scenarios ⎯ Execution of the load test October 2020
  20. Schedule Schedule on the day of the test ⎯ Two

    consecutive days (one backup day) ⎯ Two inconsecutive days (one backup day) ⎯ Consider whether to use the backup day based on the result of the test ⎯ Determine within the four days including the backup days ⎯ 0:00 AM to 2:00 AM Preparation by each team ⎯ Apply the scenario several times from 2:00 AM to 5:00 AM ⎯ Test backup/cleanup from 5:00 AM to 6:00 AM ⎯ Reflect on the test from 6:00 AM to 7:00 AM Load test schedule
  21. Remote Communication We made it easy to grasp the situation

    using the voice chat of Discord on the day of the test because some members came to the office but others didn’t due to the coronavirus disaster The team in charge of the load test recorded the result Utilization of Discord
  22. Execution Environment of the Load Test Container (Aggregate-Runner) AWS Fargate

    AWS ECR VPC AWS Cloud Subnet 1a Container (Gatling-Runner) AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Other Accounts AWS Cloud Internet gateway Build Call RunTaskAPI for two account environments Operator
  23. Execution Environment of the Load Test Container (Aggregate-Runner) AWS Fargate

    AWS ECR VPC AWS Cloud Subnet 1a Container (Gatling-Runner) AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Other Accounts AWS Cloud Internet gateway Build Call RunTaskAPI for two account environments Operator
  24. Execution Environment of the Load Test Container (Aggregate-Runner) AWS Fargate

    AWS ECR VPC AWS Cloud Subnet 1a Container (Gatling-Runner) AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Other Accounts AWS Cloud Internet gateway Build Call RunTaskAPI for two account environments Operator
  25. Execution Environment of the Load Test Container (Aggregate-Runner) AWS Fargate

    AWS ECR VPC AWS Cloud Subnet 1a Container (Gatling-Runner) AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Other Accounts AWS Cloud Internet gateway Build Call RunTaskAPI for two account environments Operator
  26. Execution Environment of the Load Test Container (Aggregate-Runner) AWS Fargate

    AWS ECR VPC AWS Cloud Subnet 1a Container (Gatling-Runner) ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Other Accounts AWS Cloud Internet gateway Build Call RunTaskAPI for two account environments Operator AWS S3
  27. Target Value ⎯ Considering that the traffic to the PayPay

    Mall was twice higher in the preceding event, we applied twice larger traffic for the New Year's sale on January 1, 2021 than that on January 1, 2020 ⎯ Traffic was difficult to expect partly due to the coronavirus catastrophe Determination method
  28. System Configuration January 1, 2021 IIS(Web) IIS(API) SQL Server SQL

    Server SQL Server Data Center AWS Browser Session API Gateway Stored Procedure Stored Procedure Stored Procedure ID Authentication Service Replication iOS・Android Goods Service Search Service Akamai DCC extension using VMWare Cloud on AWS Via Akamai Change in the access destination of API Shift to a micro service Offload of Session information
  29. Scenario Point of creation of scenarios ⎯ Analysis of request

    information using Splunk ⎯ Access ratio of each function ⎯ Instantaneous maximum value of Request/Sec ⎯ Collectively get samples of Query Parameter ⎯ Create 80 or more patterns of scenarios by web/app ⎯ Recreate the access just after the start of the New Year's Sale ⎯ Create a dashboard for analysis in Splunk ⎯ Each team had to cooperate with each other to create correct scenarios because we could not grasp the actual execution method of each function from the access logs Creation of scenarios
  30. Scenario ⎯ HOME ⎯ Asynchronous processing ⎯ Search ⎯ Product

    details ⎯ Sale product details ⎯ View favorites ⎯ Sale LP ⎯ Login ⎯ Related to HOME ⎯ Announcement ⎯ Product details ⎯ Search ⎯ Measurement account information ⎯ Favorite product information ⎯ Product ranking ⎯ Product inventory ⎯ Brand list ⎯ Login ⎯ Addition of favorites Examples of Web scenarios Examples of app scenarios
  31. Scenario Scenario Execution Timetable ⎯ January 1, 2020 24,000 Request/Sec

    ⎯ 48,000 Request/Sec because the target value of traffic is twice as high ⎯ AM2:00 ~ 0.50 times the Request/Sec of January 1, 2020 Scenario execution time 5 min ⎯ AM2:30 ~ 1.0 times the Request/Sec of January 1, 2020 Scenario execution time 5 min ⎯ AM3:00 ~ 1.25 times the Request/Sec of January 1, 2020 Scenario execution time 5 min ⎯ AM3:30 ~ 1.5 times the Request/Sec of January 1, 2020 Scenario execution time 5 min ⎯ AM4:00 ~ 1.75 times the Request/Sec of January 1, 2020 Scenario execution time 5 min ⎯ AM4:30 ~ 2.0 times the Request/Sec of January 1, 2020 Scenario execution time 5 min Determination of Request/Sec
  32. Reinforcement Status of Servers Number of servers in the first

    and second load tests Number of servers in the third load test ⎯ On-premise/VMWare Cloud on AWS: Web 180 API 343 ⎯ On-premise/VMWare Cloud on AWS: Web 427 API 940 Number of servers in January 1, 2020 ⎯ On-premise/VMWare Cloud on AWS: Web 307 API 820 Number of servers in January 1, 2021 ⎯ On-premise/VMWare Cloud on AWS: Web 452 API 812 More than double
  33. Summary of the Load Test in December 2020 First test

    Some potential bottleneck was found, an only the same number of Requests/Sec could be executed while the target value was double. There were some errors in scenarios, etc. which even developers had not recognized, so we felt changes in ZOZOTOWN. Second test Restart with 0.5 times higher loads because scenarios were partially modified. As for web, resources are used up with the same loads with the number of servers obtained by trial calculation. Only the app endured two times higher loads, but there remained agita factors in the result of web, so we redid the test. Third test Final test Web: Endures 1.25 times higher loads à Issue that the target value of loads could not applied fully App: Endured two times higher loads
  34. Evacuation training for quick collaboration with parties concerned in the

    event of trouble ⎯ Identification of bottlenecks in each team ⎯ Reduction of the contents-cut amount ⎯ Guideline for how much loads the current resources can endure ⎯ We could share the recognition of what each team should do in the event of an error Realize a load test under an environment equivalent to sale in accordance with user access line to identify and improve the possible bottlenecks in advance Summary of the Load Test in December 2020
  35. Service Availability January 1, 2021 Web 97 97.2 97.4 97.6

    97.8 98 98.2 98.4 98.6 98.8 99 99.2 99.4 99.6 99.8 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  36. Service Availability January 1, 2021 App 93.5 94 94.5 95

    95.5 96 96.5 97 97.5 98 98.5 99 99.5 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  37. ⎯ Planning of a schedule with extra time because it

    was a physically demanding task due to night shift within a short period of time ⎯ Improvement in the accuracy of the load test execution tool ⎯ Execution of scenarios of addition of products to the cart and payment ⎯ Reinforcement of collaboration by departments whose collaboration was insufficient ⎯ Performed the test on a day with many coupons Required improvements pointed out in KPT Summary of the Load Test in December 2020
  38. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  39. Organization Each team (12 teams) ⎯ Two persons: project lead

    and support ⎯ Planning of the schedule ⎯ Creation of scenarios ⎯ Determination of the target value ⎯ Execution of the load test ⎯ Infrastructure resource estimate ⎯ Infrastructure resource reinforcement ⎯ Resource monitoring on the test day ⎯ Cooperation for creation of scenarios Team in charge of the load test Breakdown of the teams ⎯ Back end (7 teams) ⎯ SRE (3 teams) ⎯ MLOps (1 team) ⎯ DB (1 team)
  40. Schedule November 2021 December 2021 ⎯ Determination of the persons

    in charge of the load test ⎯ Consideration of the schedule and target value by the persons in charge of the load test ⎯ Kick-off with the parties concerned and request for cooperation ⎯ Preparation of the load test execution environment ⎯ Creation of scenarios ⎯ Execution of the load test ⎯ Correction of scenarios ⎯ Execution of the load test October 2021
  41. Schedule Schedule on the day of the test ⎯ Plan

    the schedule considering the number of coupons ⎯ Two days with many coupons (including one backup day) ⎯ Two inconsecutive days for the execution of the load test ⎯ Determine within the four days including the backup days ⎯ 0:00 AM to 2:00 AM Preparation by each team ⎯ Apply the scenario several times from 2:00 AM to 5:00 AM ⎯ Clean up from 5:00 AM to 6:00 AM ⎯ Reflect on the test from 6:00 AM to 7:00 AM, Determination of Next Action Load test schedule
  42. Remote Communication All members participate remotely Confirm the situation using

    Meet on the day of the test After the load test, prepare a common material in which all teams enter the result of the load test and reflect on the load test immediately for quick determination of the next action Utilization of Google Meet
  43. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  44. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  45. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  46. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  47. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  48. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  49. Target Value ⎯ Set an appropriate target value considering the

    growth rate ⎯ We analyzed the growth rate from the traffic and the numbers of UUs and PVs on January 1, 2021 and assumed the traffic of the New Year's sales on January 1, 2022 to be 1.6 times larger than on January 1, 2021 Determination method
  50. System Configuration January 1, 2022 IIS(Web) IIS(API) SQL Server Data

    Center AWS Browser Session API Gateway Stored Procedure ID Authentication Service iOS・Android Goods Service Search Service Akamai Lift to EC2 Akamai Replication Payment service BFF (Aggregation) SQL Server Stored Procedure SQL Server Stored Procedure IIS(API) Replacement of the HOME side Replacement of the payment function Phase1 Introduction of Akamai
  51. Scenario Point of creation of scenarios ⎯ Analysis of request

    information using Splunk ⎯ Access ratio of each function ⎯ Instantaneous maximum value of Request/Sec ⎯ Collectively get samples of Query Parameter ⎯ Create 80 or more patterns of scenarios by web/app ⎯ Recreate the access just after the start of the New Year's Sale ⎯ Create a dashboard for analysis in Splunk ⎯ Addition of scenarios of the payment function ⎯ Preparation of text accounts and test product numbers ⎯ Divide scenarios (payment, login, and main) Creation of scenarios
  52. Scenario ⎯ HOME ⎯ Asynchronous processing ⎯ Search ⎯ Product

    details ⎯ Sale product details ⎯ View favorites ⎯ Sale LP ⎯ Login ⎯ Addition of products to the cart ⎯ Order ⎯ Related to HOME ⎯ Announcement ⎯ Product details ⎯ Search ⎯ Favorite product information ⎯ Product ranking ⎯ Product inventory ⎯ Brand list ⎯ Login ⎯ Addition of favorites ⎯ New HOME side ⎯ Addition of products to the cart Examples of Web scenarios Examples of app scenarios
  53. Scenario Scenario Execution Timetable ⎯ January 1, 2021 28,800 Request/Sec

    ⎯ 46,080 Request/Sec because the target value of traffic is 1.6 times as high ⎯ AM2:00 ~ 0.25 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM2:30 ~ 0.75 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM3:00 ~ 1.0 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM3:30 ~ 1.25 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM4:00 ~ 1.4 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM4:30 ~ 1.6 times the Request/Sec of January 1, 2021 Scenario execution time 5 min Determination of Request/Sec
  54. Reinforcement Status of Servers Transfer of the API server to

    EC2 ⎯ Preparation takes 40 hours ⎯ Upper limit of the procurable stock quantity ⎯ Occurrence of connection timeout due to overcommitment of CPU ⎯ Some parts had to be set manually ⎯ Preparation takes 4 hours ⎯ It is easy to secure resources in advance thanks to preliminary negotiations with AWS ⎯ The initial setting, the monitoring setting, etc. become easier ⎯ Automation of the operation Issue of VMWare Cloud on AWS Host startup Construction Execution in the production environment 40 hours à 4 hours
  55. Reinforcement Status of Servers Number of servers in the first

    load test Number of servers in the second and third load tests ⎯ On-premise/VMWare Cloud on AWS: Web 452 API 812 ⎯ On-premise/EC2: Web 480 API 1018 (EC 900) Number of servers in January 1, 2021 ⎯ On-premise/EC2: Web 480 API 1020 (EC 820) Number of servers in January 1, 2021 ⎯ On-premise/EC2: Web 588 API 1160 (EC 1160) Number of servers in the second load test ⎯ On-premise/EC2: Web 468 API 1160 (EC 1160) Enable to operate API servers only in EC2
  56. Summary of the Load Test in January 2021 First test

    We could execute only the same number of Requests/Sec due to mistakes in the first couple of scenario In addition, partly because we had never tried a scenario of the payment function, more loads than expected were applied. Second test Both web and the app endured 1.25 times higher loads. Depletion of resources did not occur. We decided to use the backup day because unusual loads were applied to some parts of micro services due to a defect in the scenario. Third test Changed the loads in the scenario of the payment function to the same loads because the loads were too high. Both web BOEUIFBQQFOEVSFEUJNFT IJHIFSMPBET 1FSGPSNFEUIFUFTUPOUIFMBTUEBZUPWFSJGZUIFUISPUUMJOH PGUIFQBZNFOUGVODUJPO Fourth test Final test Web: Endured 1.6 times higher loads App: Endured 1.6 times higher loads We could apply the target value of loads
  57. Summary of the Load Test in January 2021 Evacuation training

    for quick collaboration with parties concerned in the event of trouble ⎯ Detection of IP depletion problems of EKS ⎯ Performance measures by cutting contents in effective points ⎯ It was a good rehearsal for remote communication ⎯ We could share the recognition of what each team should do in the event of an error Realize a load test under an environment equivalent to sale in accordance with user access line to identify and improve the possible bottlenecks in advance
  58. Service Availability January 1, 2022 Web 97 97.2 97.4 97.6

    97.8 98 98.2 98.4 98.6 98.8 99 99.2 99.4 99.6 99.8 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  59. Service Availability January 1, 2022 App 97 97.2 97.4 97.6

    97.8 98 98.2 98.4 98.6 98.8 99 99.2 99.4 99.6 99.8 100 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Service Availability
  60. Summary of the Load Test in January 2021 ⎯ Improvement

    in the accuracy of scenarios ⎯ Creation of an architecture diagram which shows the procedure from access to web/app to micro services ⎯ Construction of an environment where preliminary tests are easy to perform ⎯ Thorough management of the schedule on the day of the test ⎯ Creation of a dashboard and console which participants should see Required improvements pointed out in KPT
  61. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in December 2020 - Load Test in December 2021 - Load Test in December 2022 - Summary
  62. Organization Each team (12 teams) ⎯ Organization of seven people

    as a working group ⎯ Planning of the schedule ⎯ Creation of scenarios ⎯ Determination of the target value ⎯ Execution of the load test ⎯ Infrastructure resource estimate ⎯ Infrastructure resource reinforcement ⎯ Resource monitoring on the test day ⎯ Cooperation for creation of scenarios Team in charge of the load test Breakdown of the teams ⎯ Backend (7 teams) ⎯ SRE (3 teams) ⎯ MLOps (1 team) ⎯ DB (1 team)
  63. Schedule November 2022 December 2022 ⎯ Determination of the participants

    ⎯ Consideration of the schedule and target value by the persons in charge of the load test ⎯ Kick-off with the parties concerned and request for cooperation ⎯ Preparation of the load test execution environment ⎯ Creation of scenarios ⎯ Correction of scenarios ⎯ Execution of the load test ⎯ Execution of the load test October 2022 End of September 2022 ⎯ Determination of the chief of the working group
  64. Schedule Schedule on the day of the test ⎯ Plan

    the schedule considering the number of coupons ⎯ Two days with many coupons (including one backup day) ⎯ Two inconsecutive days for the execution of the load test ⎯ Determine within the four days including the backup days ⎯ Apply the scenario several times from 2:00 AM to 5:00 AM ⎯ Clean up from 5:00 AM to 6:00 AM ⎯ Reflect on the test on the day from 6:00 AM to 6:30 AM, Determination of Next Action Load test schedule
  65. Execution Environment of the Load Test Gatling Runner SecretManager AWS

    ECR Deploy AWS Cloud Subnet 1a AWS S3 ZOZOTOWN Internet gateway Subnet 1c Subnet 1d Gatling Runner Gatling Operator NAT Gateway NAT Gateway AWS EKS Gatling Reporter Image Build Operator
  66. Target Value ⎯ We analyzed the growth rate from the

    traffic and the numbers of UUs and PVs on January 1, 2022 and found no significant change, so we assumed the traffic of the New Year's sales on January 1, 2023 to be 1.6 times larger than on January 1, 2022 ⎯ Analyze the latency during the New Year sale and considered the target latency for each scenario Determination method
  67. System Configuration January 1, 2023 IIS(Web) IIS(API) SQL Server Data

    Center AWS Browser Session API Gateway Stored Procedure ID Authentication Service iOS・Android Goods Service Search Service Akamai Akamai Replication Payment service BFF (Aggregation) SQL Server SQL Server IIS(API) Front-end Replacement Phase1 Web Gateway Stored Procedure Stored Procedure Web Front
  68. Scenario Point of creation of scenarios ⎯ Analysis of request

    information using Splunk ⎯ Access ratio of each function ⎯ Instantaneous maximum value of Request/Sec ⎯ Collectively get samples of Query Parameter ⎯ Create 80 or more patterns of scenarios by web/app ⎯ Additional scenarios for Front End Replacement Phase 1 ⎯ Access scenario for hot item ⎯ Access number scenario assuming rush when crossing the day Creation of scenarios
  69. Scenario ⎯ HOME ⎯ Asynchronous processing ⎯ Search ⎯ Product

    details ⎯ Sale product details ⎯ View favorites ⎯ Sale LP ⎯ Login ⎯ Addition of products to the cart ⎯ Order ⎯ Related to HOME ⎯ Announcement ⎯ Product details ⎯ Search ⎯ Favorite product information ⎯ Product ranking ⎯ Product inventory ⎯ Brand list ⎯ Login ⎯ Addition of favorites ⎯ New HOME side ⎯ Addition of products to the cart Examples of Web scenarios Examples of app scenarios
  70. Scenario Scenario Execution Timetable ⎯ January 1, 2021 28,800 Request/Sec

    ⎯ 46,080 Request/Sec because the target value of traffic is 1.6 times as high ⎯ AM2:00 ~ 0.25 times the Request/Sec of January 1, 2021 Scenario execution time 5 min AM2:30 ~ 0.5 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM3:00 ~ 1.0 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM3:30 ~ 1.3 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM4:00 ~ 1.6 times the Request/Sec of January 1, 2021 Scenario execution time 5 min ⎯ AM4:30 ~ reserve Determination of Request/Sec
  71. Agenda - Background and Purpose of the Load Test -

    Characteristics of the Load Test - New Year's sale on January 1, 2020 - Load Test in 2020 - Load Test in 2021 - Results of January 1, 2021 and January 1, 2022 - Load Test in December 2022 - Summary
  72. Summary The load test enables engineers to manage the New

    Year's sales safely so that users can use ZOZOTOWN comfortably - Creation of scenarios in accordance with changes in ZOZOTOWN - Set an appropriate target value to realize the loads of the New Year's sale - Resource reinforcement enough to cover the loads of the New Year's sale - Identification of bottlenecks through the load test - Preliminary communication assuming the New Year's sale