Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Server Architecture Behind Safety Check at LINE

Server Architecture Behind Safety Check at LINE

Osorio Alfredo (LINE / CSI Dev B Team / Server-side engineer )
Zhixin Li (LINE / CSI Dev A Team / Server-side engineer)

https://tech-verse.me/ja/sessions/4
https://tech-verse.me/en/sessions/4
https://tech-verse.me/ko/sessions/4

Tech-Verse2022

November 18, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Speakers CSI Dev B Team Server-side engineer CSI Dev A

    Team Server-side engineer Alfredo Osorio Zhixin Li
  2. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  3. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  4. When disaster happens in the user’s region, a red banner

    shows on the home tab of LINE. Disaster Banner
  5. In addition to setting your status as Safe or Affected,

    you can also enter additional information from either message template or typing yourself. Input your status
  6. You can check the statuses of others in the main

    service page. View friend’s statuses
  7. Goal Prevent paralyzed communication Minimize battery and data consumption Quick

    information confirmation Give users a peace of mind during disasters
  8. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  9. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB CMS
  10. CMS • Content Management System • Manage service meta data

    easily • Front-end: Web HTTP service with authentication • Back-end: REST API implemented by Armeria
  11. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  12. Notification Service • A light weighted event delivery system for

    LINE • A signal to send to the client side when changing contents in the CMS • iOS/Android side request the server content after receiving the signal • Delay time and client filter are provided Notification Service Safety Check Server 1. signal 2. request 3. response
  13. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  14. { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles":

    { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } Safety Check DB • MongoDB is used as primary database _id : ObjectId, archived : Boolean, enabled : Boolean, region : String, localizedTitles : Object, localizedDescriptions : Object, seeMoreUrl : String, createdAtMillis : NumberLong, updatedAtMillis : NumberLong { "_id": ObjectId("6265efc5c5083d73240ac6c2"), "archived": true, "enabled": false, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": NumberLong(1650847685900), "updatedAtMillis": NumberLong(1653374701559) } { "_id": "6265efc5c5083d73240ac6c2", "archived": false, "enabled": true, "region": "JP", "localizedTitles": { "en_US": "Fukushima 7.3 Earthquake", "ja_JP": "..." }, "localizedDescriptions": { "en_US": "Earthquake occurred on 3/16 23:36", "ja_JP": "..." }, "seeMoreUrl": "...", "createdAtMillis": 1650847685900, "updatedAtMillis": 1653374701559 }
  15. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB Server API
  16. Server API • getDisasterCases is defined in THRIFT • Server

    gets user country and language via API request struct GetDisasterCasesResponse { 1: list<DisasterInfo> disasters, 2: list<string> messageTemplates, /** * Indicates the TTL (time to live) in milliseconds for the the response so that the clients can cache it and request for updated information when it becomes stale. */ 3: i64 ttl, }
  17. Safety Check CMS Architecture Safety Check CMS Server Safety Check

    Server Web CMS Safety Check DB Notification Service User REST Request getDisasterCases Server cache Operator A P I G a t e w a y Read & Write Read Local DB
  18. Server-side Cache • Server local memory cache is used to

    reduce DB request • Disaster data do not change frequently, and they are the same for all users in the country and language • Cache is updated asynchronously every fixed time Safety Check Server 1. request 2. response Server cache Safety Check DB Async update
  19. Client-side Cache • iOS/Android keep their local DB for disaster

    cases in the previous request • Banner will keep showing for some users when notification is delayed or lost • Strategies of disaster API • Client will request API (getDisasterCases) when user access end page of safety check • Client will request API when TTL expires
  20. Agenda - About Safety Check - Disaster management and fast

    notification - Event-driven architecture for updates - Load testing - Summary
  21. Decaton • Stream task processing framework built on top of

    Kafka developed by LINE. • Design goals included enabling concurrent processing of records from a single partition. • https://github.com/line/decaton
  22. Safety Check Update Architecture Safety Check Server A P I

    G a t e w a y Safety Check Decaton Record Format: {“sourceUser”, “targetUser”, ”disasterId”,”status”,”message”} Local Db updateSafetySatus Thrift User A User C User B friends Notification Service produce consume produce consume
  23. Benefits of Mocking HTTP Services • Guarantee the service latency

    • Avoid Rate Limits • Avoid Modifying State • Test Edge Cases • Test Error Scenarios • Results are deterministic • Useful for integration tests and load tests
  24. Spring Cloud Contract • Spring Cloud Contract is a tool

    that enables Consumer Driven Contract (CDC) development. • It consists of: • Spring Cloud Contract Verifier Plugin (Gradle and Maven) • Spring Cloud Contract Stub Runner (Mock Server)
  25. Mocking HTTP Services Overview Define Contracts (Kotlin, Java, YML) Publish

    Stubs Artifact (JAR) Nexus Define num. of replicas Configure Service Specify Parameters: stubs, server port K8s Stub Runner Resource Definition Stub Runner (Mock Server) Fetches Stubs
  26. Load Testing with Ayaperf • A LINE developed tool for

    distributed load testing based on Locust. • Allows you define your tests using Java. • Command tool to set up the secondary nodes (workers) that will generate the load for the server under test using Kubernetes. • Metrics in Grafana.
  27. Load Testing (Ayaperf) Locust Master Worker 1 Java Client Worker

    2 Java Client Worker 3 Java Client Kubernetes Cluster Application (SUT)
  28. Load Test Architecture Locust Master Worker 1 Java Client Worker

    2 Java Client Worker 3 Java Client Kubernetes Cluster Kubernetes Cluster Safety Check Server Deployment Pod 1 Pod 2 Pod N L o a d B a l a n c e r Ayaperf Spring Cloud Contract (Wiremock) Deployment Pod 1 Pod 2 Pod N Service
  29. Load Test both APIs • RPS: 1,600 • RPS per

    API • updateSafetyStatus • 47.61 % • 761 (1600 * 0.4761) • getDisasterCases: • 52.38 % • 838 (1600 * 0.5238)
  30. Load Test both APIs Given that a single server is

    able to handle about 1,600 RPS with about 30 % CPU then 14 servers were enough to support our initial estimation of 22,000 RPS.
  31. Summary • Robust CMS allows easy configuration and fast distribution.

    • Cache strategies increase traffic tolerance and keep client up-to-date. • Event-Driven Architecture decouples your microservices. • Decaton allows to achive higher throughput with small number of partitions. • Mocking Server helps you control the test scenarios and the service latency. • Load testing allows to measure application throughput and resource utilization.