Our approach to New Year's traffic of LINE STICKER @ TECHPULSE 2023

Agenda › New Year’s Campaign › Preparations for handling spikes
of about 9x

New Year’s campaign in Japan

Challenges › We need to utilize different teams API to
send recommendation or fetch the user information › It’s the biggest sticker campaign and we don’t want to affect our existing sticker services during new year season

New Year’s campaign in Japan › Buy a campaign sticker
to get the fortune slip › From 12/26, we can send fortune slip to your friends or draw by self › One of the biggest sticker campaign in Japan

Design › Utilize Kafka provided by IMF team Fully asynchronous
Can adjust configuration dynamically for throttling › Kafka message processing speed › External teams’ API like Messaging, Point, etc Isolated service › Separate modules rather than implementing functionality in existing services › Use Decaton to process Kafka event › Use RxJava/R2DBC to make our code fully asynchronous High throughput

› The api server does the minimum   necessary processing
and sending   event that can be processed   later to Kafka System overview API/Batch Decaton Processor

Rate limited services › Make our Kafka processing and API
client calling speed can be configured dynamically › Most APIs can be retried, and if not possible, logged and handled manually › Communicate with other teams to obey the maximum traffic can be handled by other teams › Perform load testing with other teams

Failover testing › For our storages › Redis › MySQL
› MongoDB  › This year we discovered a race condition issue with the database client library during failover

Appropriate estimation from planning › Event it self is not
just single day but whole month › From 2022/12/1 to 2023/1/13  › Well estimated OA messages › Send campaign relation information during the campaign › The system is stable during the campaign period

Preparations for handling spikes of about 9x

Features provided by LINE STICKER › Create and update the
product Provider side(Official & Creator) › Listing including search and recommendation › Purchase and download the resources › Send and receive stickers User side

System overview Talk server Open Chat Home content CDN API
gateway Web site API/Search server ES MySQL MongoDB Capability server Image server Object storage

System overview Talk server Open Chat Home content CDN API
gateway Web site API/Search server ES MySQL MongoDB Capability server Image server Object storage Send sticker Listing/Recommend Downloading images

What happens at new year’s eve

Difficulty of new year’s eve › Annual event › Hard
to estimate the load due to implementation or architecture changes › Not easy to figure out the traffic of new features › Spike all at once in a short time › About 9 times what it was a minute ago › Increased sales for a few hours › Japan(UTC+9)→Taiwan(UTC+8)→Thai(UTC+7)

Average growth rate The geometric mean per year from the
growth rate over multiple years. It absorbs some of the ups and downs in the annual growth rate. Average growth rate +44%

The year-on-year ratio of the number of accesses under normal
conditions Calculate the ratio of peak accesses on the weekday of November of the previous year and this year. Consider this ratio as a growth rate and multiply it by the number of accesses on New Year's Day of the previous year. If the previous year's number of accesses is not reliable, it is not predictable. 2021 2022 2023 Weekday of November 1000rps 2500rps 5000rps NY’s Day 2000rps 4500rps ???rps

Estimate the number › We choose max value of average
growth rate and year-on-year ratio › If both data are not available, refer to other similar services Have Data No Data Have Data Max(A, B) B No Data A refer to other similar services A B

Preparing the instances › Services with confidence in estimation ›
70% CPU usage as target › Services that have concerns about estimation › 50% CPU usage as target › Had outage last year or not enough information or we have concerns  › Use metrics to find services where the rate of increase in CPU is far greater than the rate of increase in requests. › Adjust the configuration and checking the code to find problems

Load testing at production environment › Scale in and check
the latency and error rate › Achieve the 70% CPU usage › Resolve bottlenecks when they are easy to resolve LB Server 1 Server 2 Server 3 Server 4 Server 5

Monitoring and Operations

Assign monitoring members by context › Overloading may affect related
services in the context › Sticker › Store › Image › etc.  › To avoid using whole monitoring members resources for single outage

Monitoring › Dedicated dashboard for new year › Focus on
the most used services like sending/receiving, listing, image downloading related API › Make a panel for each service to avoid overloading of grafana › We use rps per node due to total rps is not meaningful if we can scale out easily

Playbook for unexpected requests › More than what we estimated
› Share inside the team › Check if it will increase more in short time › More than what we can handled › Share to emergency channel in the company › Immediate action taken as planned

Preparing for priority load shedding › We have throttling feature
for each API › How can we know which API affects which service, screen, API, etc. › What is the UX when an error occurs? › Workshop with stakeholders like engineer, QA, planner, etc. › Try from the most requested API › Checking the result at dev environment › It is not necessary to check all APIs, but only to identify those that have the greatest impact

Result › Improved our services during preparation › Able to
predict the number more accurately each year › Overall service was stable on New Year's Eve 2022 and 2023

Thank you

Our approach to New Year's traffic of LINE STIC...

Our approach to New Year's traffic of LINE STICKER @ TECHPULSE 2023

LINE Developers Taiwan PRO

More Decks by LINE Developers Taiwan

Other Decks in Technology

Featured

Transcript

1

Agenda › New Year’s Campaign › Preparations for handling spikes

New Year’s campaign in Japan

Challenges › We need to utilize different teams API to

New Year’s campaign in Japan › Buy a campaign sticker

Design › Utilize Kafka provided by IMF team Fully asynchronous

› The api server does the minimum   necessary processing

Rate limited services › Make our Kafka processing and API

Failover testing › For our storages › Redis › MySQL

Appropriate estimation from planning › Event it self is not

Preparations for handling spikes of about 9x

Features provided by LINE STICKER › Create and update the

System overview Talk server Open Chat Home content CDN API

System overview Talk server Open Chat Home content CDN API

What happens at new year’s eve

Difficulty of new year’s eve › Annual event › Hard

Average growth rate The geometric mean per year from the

The year-on-year ratio of the number of accesses under normal

Estimate the number › We choose max value of average

Preparing the instances › Services with confidence in estimation ›

Load testing at production environment › Scale in and check

Monitoring and Operations

Assign monitoring members by context › Overloading may affect related

Monitoring › Dedicated dashboard for new year › Focus on

Playbook for unexpected requests › More than what we estimated

Preparing for priority load shedding › We have throttling feature

Result › Improved our services during preparation › Able to

Thank you