$30 off During Our Annual Pro Sale. View Details »

Server Architecture Behind Safety Check at LINE

Server Architecture Behind Safety Check at LINE

Osorio Alfredo (LINE / CSI Dev B Team / Server-side engineer )
Zhixin Li (LINE / CSI Dev A Team / Server-side engineer)

https://tech-verse.me/ja/sessions/4
https://tech-verse.me/en/sessions/4
https://tech-verse.me/ko/sessions/4

Tech-Verse2022
PRO

November 18, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. None
  2. Speakers CSI Dev B Team Server-side engineer CSI Dev A

    Team Server-side engineer Alfredo Osorio Zhixin Li
  3. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  4. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  5. Safety Check

  6. When disaster happens in the user’s region, a red banner

    shows on the home tab of LINE. Disaster Banner
  7. In addition to setting your status as Safe or Affected,

    you can also enter additional information from either message template or typing yourself. Input your status
  8. You can check the statuses of others in the main

    service page. View friend’s statuses
  9. Goal Prevent paralyzed communication Minimize battery and data consumption Quick

    information confirmation Give users a peace of mind during disasters
  10. Disaster Management

  11. Application Scenario Decide whether to enable feature Earthquake Banner shows

    for target users Register using CMS
  12. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  13. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB CMS
  14. CMS • Content Management System • Manage service meta data

    easily • Front-end: Web HTTP service with authentication • Back-end: REST API implemented by Armeria
  15. CMS Disaster Case Message Template Notification Service

  16. Message Template Disaster Case Disaster Case / Message Template

  17. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  18. Notification Service • A light weighted event delivery system for

    LINE • A signal to send to the client side when changing contents in the CMS • iOS/Android side request the server content after receiving the signal • Delay time and client filter are provided Notification Service Safety Check Server 1. signal 2. request 3. response
  19. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  20. { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles":

    { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } Safety Check DB • MongoDB is used as primary database _id : ObjectId, archived : Boolean, enabled : Boolean, region : String, localizedTitles : Object, localizedDescriptions : Object, seeMoreUrl : String, createdAtMillis : NumberLong, updatedAtMillis : NumberLong { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } { "_id": "6265efc5c5083d73240ac6c2", "archived": false, "enabled": true, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": 1650847685900, "updatedAtMillis": 1653374701559 }
  21. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB Server API
  22. Server API • getDisasterCases is defined in THRIFT • Server

    gets user country and language via API request struct GetDisasterCasesResponse { 1: list<DisasterInfo> disasters, 2: list<string> messageTemplates, /** * Indicates the TTL (time to live) in milliseconds for the the response so that the clients can cache it and request for updated information when it becomes stale. */ 3: i64 ttl, }
  23. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  24. Server-side Cache • Server local memory cache is used to

    reduce DB request • Disaster data do not change frequently, and they are the same for all users in the country and language • Cache is updated asynchronously every fixed time Safety Check Server 1. request 2. response Server cache Safety Check DB Async update
  25. Client-side Cache • iOS/Android keep their local DB for disaster

    cases in the previous request • Banner will keep showing for some users when notification is delayed or lost • Strategies of disaster API • Client will request API (getDisasterCases) when user access end page of safety check • Client will request API when TTL expires
  26. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  27. Event-driven architecture for updates

  28. Publish/Subscribe Model Producer Consumer Broker (Kafka) write read

  29. Decaton • Stream task processing framework built on top of

    Kafka developed by LINE. • Design goals included enabling concurrent processing of records from a single partition. • https://github.com/line/decaton
  30. Safety Check Update Architecture Safety Check Server A P I

    G a t e w a y Safety Check Decaton Record Format: {“sourceUser”, “targetUser”, ”disasterId”,”status”,”message”} Local Db updateSafetySatus Thrift User A User C User B friends Notification Service produce consume produce consume
  31. Load Testing

  32. Mocking Services

  33. Testing System Under Test (SUT) Service B Service A HTTP

    Request
  34. Testing System Under Test (SUT) Mock Server HTTP Request

  35. Benefits of Mocking HTTP Services • Guarantee the service latency

    • Avoid Rate Limits • Avoid Modifying State • Test Edge Cases • Test Error Scenarios • Results are deterministic • Useful for integration tests and load tests
  36. Spring Cloud Contract • Spring Cloud Contract is a tool

    that enables Consumer Driven Contract (CDC) development. • It consists of: • Spring Cloud Contract Verifier Plugin (Gradle and Maven) • Spring Cloud Contract Stub Runner (Mock Server)
  37. Contract Project Structure

  38. Contract Example

  39. Project’s build.gradle

  40. Mocking HTTP Services Overview Define Contracts (Kotlin, Java, YML) Publish

    Stubs Artifact (JAR) Nexus Define num. of replicas Configure Service Specify Parameters: stubs, server port K8s Stub Runner Resource Definition Stub Runner (Mock Server) Fetches Stubs
  41. Kubernetes Resources Deployment Service

  42. Load Testing Tool

  43. Load Testing with Ayaperf • A LINE developed tool for

    distributed load testing based on Locust. • Allows you define your tests using Java. • Command tool to set up the secondary nodes (workers) that will generate the load for the server under test using Kubernetes. • Metrics in Grafana.
  44. Load Testing (Ayaperf) Locust Master Worker 1 Java Client Worker

    2 Java Client Worker 3 Java Client Kubernetes Cluster Application (SUT)
  45. Load Test Definition

  46. Load Test Definition

  47. Load Test Architecture Locust Master Worker 1 Java Client Worker

    2 Java Client Worker 3 Java Client Kubernetes Cluster Kubernetes Cluster Safety Check Server Deployment Pod 1 Pod 2 Pod N L o a d B a l a n c e r Ayaperf Spring Cloud Contract (Wiremock) Deployment Pod 1 Pod 2 Pod N Service
  48. Load Testing Results

  49. Estimation

  50. Safety Check Server updateSafetyStatus • Num of VM: 1 •

    Spec: 8vCPU 16GB RAM
  51. Safety Check Server getDisastersCases • Num of VM: 1 •

    Spec: 8vCPU 16GB RAM
  52. Load Test both APIs • RPS: 1,600 • RPS per

    API • updateSafetyStatus • 47.61 % • 761 (1600 * 0.4761) • getDisasterCases: • 52.38 % • 838 (1600 * 0.5238)
  53. Load Test both APIs Given that a single server is

    able to handle about 1,600 RPS with about 30 % CPU then 14 servers were enough to support our initial estimation of 22,000 RPS.
  54. Summary

  55. Summary • Robust CMS allows easy configuration and fast distribution.

    • Cache strategies increase traffic tolerance and keep client up-to-date. • Event-Driven Architecture decouples your microservices. • Decaton allows to achive higher throughput with small number of partitions. • Mocking Server helps you control the test scenarios and the service latency. • Load testing allows to measure application throughput and resource utilization.