Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Speakers CSI Dev B Team Server-side engineer CSI Dev A Team Server-side engineer Alfredo Osorio Zhixin Li

Slide 3

Slide 3 text

Agenda - About Safety Check - Disaster management and fast notification - Event-driven architecture for updates - Load testing - Summary

Slide 4

Slide 4 text

Agenda - About Safety Check - Disaster management and fast notification - Event-driven architecture for updates - Load testing - Summary

Slide 5

Slide 5 text

Safety Check

Slide 6

Slide 6 text

When disaster happens in the user’s region, a red banner shows on the home tab of LINE. Disaster Banner

Slide 7

Slide 7 text

In addition to setting your status as Safe or Affected, you can also enter additional information from either message template or typing yourself. Input your status

Slide 8

Slide 8 text

You can check the statuses of others in the main service page. View friend’s statuses

Slide 9

Slide 9 text

Goal Prevent paralyzed communication Minimize battery and data consumption Quick information confirmation Give users a peace of mind during disasters

Slide 10

Slide 10 text

Disaster Management

Slide 11

Slide 11 text

Application Scenario Decide whether to enable feature Earthquake Banner shows for target users Register using CMS

Slide 12

Slide 12 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB

Slide 13

Slide 13 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB CMS

Slide 14

Slide 14 text

CMS • Content Management System • Manage service meta data easily • Front-end: Web HTTP service with authentication • Back-end: REST API implemented by Armeria

Slide 15

Slide 15 text

CMS Disaster Case Message Template Notification Service

Slide 16

Slide 16 text

Message Template Disaster Case Disaster Case / Message Template

Slide 17

Slide 17 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB

Slide 18

Slide 18 text

Notification Service • A light weighted event delivery system for LINE • A signal to send to the client side when changing contents in the CMS • iOS/Android side request the server content after receiving the signal • Delay time and client filter are provided Notification Service Safety Check Server 1. signal 2. request 3. response

Slide 19

Slide 19 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB

Slide 20

Slide 20 text

{ "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } Safety Check DB • MongoDB is used as primary database _id : ObjectId, archived : Boolean, enabled : Boolean, region : String, localizedTitles : Object, localizedDescriptions : Object, seeMoreUrl : String, createdAtMillis : NumberLong, updatedAtMillis : NumberLong { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } { "_id": "6265efc5c5083d73240ac6c2", "archived": false, "enabled": true, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": 1650847685900, "updatedAtMillis": 1653374701559 }

Slide 21

Slide 21 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB Server API

Slide 22

Slide 22 text

Server API • getDisasterCases is defined in THRIFT • Server gets user country and language via API request struct GetDisasterCasesResponse { 1: list disasters, 2: list messageTemplates, /** * Indicates the TTL (time to live) in milliseconds for the the response so that the clients can cache it and request for updated information when it becomes stale. */ 3: i64 ttl, }

Slide 23

Slide 23 text

Safety Check CMS Architecture Safety Check CMS Server Safety Check Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB

Slide 24

Slide 24 text

Server-side Cache • Server local memory cache is used to reduce DB request • Disaster data do not change frequently, and they are the same for all users in the country and language • Cache is updated asynchronously every fixed time Safety Check Server 1. request 2. response Server cache Safety Check DB Async update

Slide 25

Slide 25 text

Client-side Cache • iOS/Android keep their local DB for disaster cases in the previous request • Banner will keep showing for some users when notification is delayed or lost • Strategies of disaster API • Client will request API (getDisasterCases) when user access end page of safety check • Client will request API when TTL expires

Slide 26

Slide 26 text

Agenda - About Safety Check - Disaster management and fast notification - Event-driven architecture for updates - Load testing - Summary

Slide 27

Slide 27 text

Event-driven architecture for updates

Slide 28

Slide 28 text

Publish/Subscribe Model Producer Consumer Broker (Kafka) write read

Slide 29

Slide 29 text

Decaton • Stream task processing framework built on top of Kafka developed by LINE. • Design goals included enabling concurrent processing of records from a single partition. • https://github.com/line/decaton

Slide 30

Slide 30 text

Safety Check Update Architecture Safety Check Server A P I G a t e w a y Safety Check Decaton Record Format: {“sourceUser”, “targetUser”, ”disasterId”,”status”,”message”} Local Db updateSafetySatus Thrift User A User C User B friends Notification Service produce consume produce consume

Slide 31

Slide 31 text

Load Testing

Slide 32

Slide 32 text

Mocking Services

Slide 33

Slide 33 text

Testing System Under Test (SUT) Service B Service A HTTP Request

Slide 34

Slide 34 text

Testing System Under Test (SUT) Mock Server HTTP Request

Slide 35

Slide 35 text

Benefits of Mocking HTTP Services • Guarantee the service latency • Avoid Rate Limits • Avoid Modifying State • Test Edge Cases • Test Error Scenarios • Results are deterministic • Useful for integration tests and load tests

Slide 36

Slide 36 text

Spring Cloud Contract • Spring Cloud Contract is a tool that enables Consumer Driven Contract (CDC) development. • It consists of: • Spring Cloud Contract Verifier Plugin (Gradle and Maven) • Spring Cloud Contract Stub Runner (Mock Server)

Slide 37

Slide 37 text

Contract Project Structure

Slide 38

Slide 38 text

Contract Example

Slide 39

Slide 39 text

Project’s build.gradle

Slide 40

Slide 40 text

Mocking HTTP Services Overview Define Contracts (Kotlin, Java, YML) Publish Stubs Artifact (JAR) Nexus Define num. of replicas Configure Service Specify Parameters: stubs, server port K8s Stub Runner Resource Definition Stub Runner (Mock Server) Fetches Stubs

Slide 41

Slide 41 text

Kubernetes Resources Deployment Service

Slide 42

Slide 42 text

Load Testing Tool

Slide 43

Slide 43 text

Load Testing with Ayaperf • A LINE developed tool for distributed load testing based on Locust. • Allows you define your tests using Java. • Command tool to set up the secondary nodes (workers) that will generate the load for the server under test using Kubernetes. • Metrics in Grafana.

Slide 44

Slide 44 text

Load Testing (Ayaperf) Locust Master Worker 1 Java Client Worker 2 Java Client Worker 3 Java Client Kubernetes Cluster Application (SUT)

Slide 45

Slide 45 text

Load Test Definition

Slide 46

Slide 46 text

Load Test Definition

Slide 47

Slide 47 text

Load Test Architecture Locust Master Worker 1 Java Client Worker 2 Java Client Worker 3 Java Client Kubernetes Cluster Kubernetes Cluster Safety Check Server Deployment Pod 1 Pod 2 Pod N L o a d B a l a n c e r Ayaperf Spring Cloud Contract (Wiremock) Deployment Pod 1 Pod 2 Pod N Service

Slide 48

Slide 48 text

Load Testing Results

Slide 49

Slide 49 text

Estimation

Slide 50

Slide 50 text

Safety Check Server updateSafetyStatus • Num of VM: 1 • Spec: 8vCPU 16GB RAM

Slide 51

Slide 51 text

Safety Check Server getDisastersCases • Num of VM: 1 • Spec: 8vCPU 16GB RAM

Slide 52

Slide 52 text

Load Test both APIs • RPS: 1,600 • RPS per API • updateSafetyStatus • 47.61 % • 761 (1600 * 0.4761) • getDisasterCases: • 52.38 % • 838 (1600 * 0.5238)

Slide 53

Slide 53 text

Load Test both APIs Given that a single server is able to handle about 1,600 RPS with about 30 % CPU then 14 servers were enough to support our initial estimation of 22,000 RPS.

Slide 54

Slide 54 text

Summary

Slide 55

Slide 55 text

Summary • Robust CMS allows easy configuration and fast distribution. • Cache strategies increase traffic tolerance and keep client up-to-date. • Event-Driven Architecture decouples your microservices. • Decaton allows to achive higher throughput with small number of partitions. • Mocking Server helps you control the test scenarios and the service latency. • Load testing allows to measure application throughput and resource utilization.