Server Architecture Behind Safety Check at LINE

Speakers CSI Dev B Team Server-side engineer CSI Dev A
Team Server-side engineer Alfredo Osorio Zhixin Li

Agenda - About Safety Check - Disaster management and fast
notification - Event-driven architecture for updates - Load testing - Summary

Safety Check

When disaster happens in the user’s region, a red banner
shows on the home tab of LINE. Disaster Banner

In addition to setting your status as Safe or Affected,
you can also enter additional information from either message template or typing yourself. Input your status

You can check the statuses of others in the main
service page. View friend’s statuses

Goal Prevent paralyzed communication Minimize battery and data consumption Quick
information confirmation Give users a peace of mind during disasters

Disaster Management

Application Scenario Decide whether to enable feature Earthquake Banner shows
for target users Register using CMS

Safety Check CMS Architecture Safety Check CMS Server Safety Check
Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB

Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB CMS

CMS • Content Management System • Manage service meta data
easily • Front-end: Web HTTP service with authentication • Back-end: REST API implemented by Armeria

CMS Disaster Case Message Template Notification Service

Message Template Disaster Case Disaster Case / Message Template

Notification Service • A light weighted event delivery system for
LINE • A signal to send to the client side when changing contents in the CMS • iOS/Android side request the server content after receiving the signal • Delay time and client filter are provided Notification Service Safety Check Server 1. signal 2. request 3. response

{ "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles":
{ "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } Safety Check DB • MongoDB is used as primary database _id : ObjectId, archived : Boolean, enabled : Boolean, region : String, localizedTitles : Object, localizedDescriptions : Object, seeMoreUrl : String, createdAtMillis : NumberLong, updatedAtMillis : NumberLong { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } { "_id": "6265efc5c5083d73240ac6c2", "archived": false, "enabled": true, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": 1650847685900, "updatedAtMillis": 1653374701559 }

Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB Server API

Server API • getDisasterCases is defined in THRIFT • Server
gets user country and language via API request struct GetDisasterCasesResponse { 1: list<DisasterInfo> disasters, 2: list<string> messageTemplates, /** * Indicates the TTL (time to live) in milliseconds for the the response so that the clients can cache it and request for updated information when it becomes stale. */ 3: i64 ttl, }

Server-side Cache • Server local memory cache is used to
reduce DB request • Disaster data do not change frequently, and they are the same for all users in the country and language • Cache is updated asynchronously every fixed time Safety Check Server 1. request 2. response Server cache Safety Check DB Async update

Client-side Cache • iOS/Android keep their local DB for disaster
cases in the previous request • Banner will keep showing for some users when notification is delayed or lost • Strategies of disaster API • Client will request API (getDisasterCases) when user access end page of safety check • Client will request API when TTL expires

Agenda - About Safety Check - Disaster management and fast
notification - Event-driven architecture for updates - Load testing - Summary

Event-driven architecture for updates

Publish/Subscribe Model Producer Consumer Broker (Kafka) write read

Decaton • Stream task processing framework built on top of
Kafka developed by LINE. • Design goals included enabling concurrent processing of records from a single partition. • https://github.com/line/decaton

Safety Check Update Architecture Safety Check Server A P I
G a t e w a y Safety Check Decaton Record Format: {“sourceUser”, “targetUser”, ”disasterId”,”status”,”message”} Local Db updateSafetySatus Thrift User A User C User B friends Notification Service produce consume produce consume

Load Testing

Mocking Services

Testing System Under Test (SUT) Service B Service A HTTP
Request

Testing System Under Test (SUT) Mock Server HTTP Request

Benefits of Mocking HTTP Services • Guarantee the service latency
• Avoid Rate Limits • Avoid Modifying State • Test Edge Cases • Test Error Scenarios • Results are deterministic • Useful for integration tests and load tests

Spring Cloud Contract • Spring Cloud Contract is a tool
that enables Consumer Driven Contract (CDC) development. • It consists of: • Spring Cloud Contract Verifier Plugin (Gradle and Maven) • Spring Cloud Contract Stub Runner (Mock Server)

Contract Project Structure

Contract Example

Project’s build.gradle

Mocking HTTP Services Overview Define Contracts (Kotlin, Java, YML) Publish
Stubs Artifact (JAR) Nexus Define num. of replicas Configure Service Specify Parameters: stubs, server port K8s Stub Runner Resource Definition Stub Runner (Mock Server) Fetches Stubs

Kubernetes Resources Deployment Service

Load Testing Tool

Load Testing with Ayaperf • A LINE developed tool for
distributed load testing based on Locust. • Allows you define your tests using Java. • Command tool to set up the secondary nodes (workers) that will generate the load for the server under test using Kubernetes. • Metrics in Grafana.

Load Testing (Ayaperf) Locust Master Worker 1 Java Client Worker
2 Java Client Worker 3 Java Client Kubernetes Cluster Application (SUT)

Load Test Definition

Load Test Architecture Locust Master Worker 1 Java Client Worker
2 Java Client Worker 3 Java Client Kubernetes Cluster Kubernetes Cluster Safety Check Server Deployment Pod 1 Pod 2 Pod N L o a d B a l a n c e r Ayaperf Spring Cloud Contract (Wiremock) Deployment Pod 1 Pod 2 Pod N Service

Load Testing Results

Estimation

Safety Check Server updateSafetyStatus • Num of VM: 1 •
Spec: 8vCPU 16GB RAM

Safety Check Server getDisastersCases • Num of VM: 1 •
Spec: 8vCPU 16GB RAM

Load Test both APIs • RPS: 1,600 • RPS per
API • updateSafetyStatus • 47.61 % • 761 (1600 * 0.4761) • getDisasterCases: • 52.38 % • 838 (1600 * 0.5238)

Load Test both APIs Given that a single server is
able to handle about 1,600 RPS with about 30 % CPU then 14 servers were enough to support our initial estimation of 22,000 RPS.

Summary

Summary • Robust CMS allows easy configuration and fast distribution.
• Cache strategies increase traffic tolerance and keep client up-to-date. • Event-Driven Architecture decouples your microservices. • Decaton allows to achive higher throughput with small number of partitions. • Mocking Server helps you control the test scenarios and the service latency. • Load testing allows to measure application throughput and resource utilization.

Server Architecture Behind Safety Check at LINE

Server Architecture Behind Safety Check at LINE

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript