Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Server Architecture Behind Safety Check at LINE

Server Architecture Behind Safety Check at LINE

Osorio Alfredo (LINE / CSI Dev B Team / Server-side engineer )
Zhixin Li (LINE / CSI Dev A Team / Server-side engineer)

https://tech-verse.me/ja/sessions/4
https://tech-verse.me/en/sessions/4
https://tech-verse.me/ko/sessions/4

Tech-Verse2022
PRO

November 18, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. View Slide

  2. Speakers
    CSI Dev B Team
    Server-side engineer
    CSI Dev A Team
    Server-side engineer
    Alfredo Osorio Zhixin Li

    View Slide

  3. Agenda
    - About Safety Check
    - Disaster management and fast notification
    - Event-driven architecture for updates
    - Load testing
    - Summary

    View Slide

  4. Agenda
    - About Safety Check
    - Disaster management and fast notification
    - Event-driven architecture for updates
    - Load testing
    - Summary

    View Slide

  5. Safety Check

    View Slide

  6. When disaster happens in the user’s
    region, a red banner shows on the home
    tab of LINE.
    Disaster Banner

    View Slide

  7. In addition to setting your status as Safe
    or Affected, you can also enter additional
    information from either message template
    or typing yourself.
    Input your status

    View Slide

  8. You can check the statuses of others in
    the main service page.
    View friend’s statuses

    View Slide

  9. Goal
    Prevent
    paralyzed
    communication
    Minimize
    battery and
    data
    consumption
    Quick
    information
    confirmation
    Give users a
    peace of mind
    during
    disasters

    View Slide

  10. Disaster Management

    View Slide

  11. Application Scenario
    Decide whether
    to enable feature
    Earthquake Banner shows for
    target users
    Register using
    CMS

    View Slide

  12. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB

    View Slide

  13. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB
    CMS

    View Slide

  14. CMS
    • Content Management System
    • Manage service meta data easily
    • Front-end: Web HTTP service with authentication
    • Back-end: REST API implemented by Armeria

    View Slide

  15. CMS
    Disaster Case
    Message Template
    Notification Service

    View Slide

  16. Message Template
    Disaster Case
    Disaster Case / Message Template

    View Slide

  17. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB

    View Slide

  18. Notification Service
    • A light weighted event delivery system for LINE
    • A signal to send to the client side when changing contents in the CMS
    • iOS/Android side request the server content after receiving the signal
    • Delay time and client filter are provided
    Notification
    Service
    Safety Check
    Server
    1. signal 2. request
    3. response

    View Slide

  19. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB

    View Slide

  20. {
    "_id": ObjectId("6265efc5c5083d73240ac6c2"),
    "archived": true,
    "enabled": false,
    "region": "JP",
    "localizedTitles": {
    "en_US": "Fukushima 7.3 Earthquake",
    "ja_JP": "..."
    },
    "localizedDescriptions": {
    "en_US": "Earthquake occurred on 3/16 23:36",
    "ja_JP": "..."
    },
    "seeMoreUrl": "...",
    "createdAtMillis": NumberLong(1650847685900),
    "updatedAtMillis": NumberLong(1653374701559)
    }
    Safety Check DB
    • MongoDB is used as primary database
    _id : ObjectId,
    archived : Boolean,
    enabled : Boolean,
    region : String,
    localizedTitles : Object,
    localizedDescriptions : Object,
    seeMoreUrl : String,
    createdAtMillis : NumberLong,
    updatedAtMillis : NumberLong
    {
    "_id": ObjectId("6265efc5c5083d73240ac6c2"),
    "archived": true,
    "enabled": false,
    "region": "JP",
    "localizedTitles": {
    "en_US": "Fukushima 7.3 Earthquake",
    "ja_JP": "..."
    },
    "localizedDescriptions": {
    "en_US": "Earthquake occurred on 3/16 23:36",
    "ja_JP": "..."
    },
    "seeMoreUrl": "...",
    "createdAtMillis": NumberLong(1650847685900),
    "updatedAtMillis": NumberLong(1653374701559)
    }
    {
    "_id": "6265efc5c5083d73240ac6c2",
    "archived": false,
    "enabled": true,
    "region": "JP",
    "localizedTitles": {
    "en_US": "Fukushima 7.3 Earthquake",
    "ja_JP": "..."
    },
    "localizedDescriptions": {
    "en_US": "Earthquake occurred on 3/16 23:36",
    "ja_JP": "..."
    },
    "seeMoreUrl": "...",
    "createdAtMillis": 1650847685900,
    "updatedAtMillis": 1653374701559
    }

    View Slide

  21. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB
    Server API

    View Slide

  22. Server API
    • getDisasterCases is defined in THRIFT
    • Server gets user country and language via API request
    struct GetDisasterCasesResponse {
    1: list disasters,
    2: list messageTemplates,
    /**
    * Indicates the TTL (time to live) in milliseconds for the the response so that the
    clients can cache it and request for updated information when it becomes stale.
    */
    3: i64 ttl,
    }

    View Slide

  23. Safety Check CMS Architecture
    Safety Check
    CMS Server
    Safety Check
    Server
    Web CMS
    Safety Check DB Notification
    Service
    User
    REST Request
    getDisasterCases
    Server cache
    Operator
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Read & Write
    Read
    Local DB

    View Slide

  24. Server-side Cache
    • Server local memory cache is used to reduce DB request
    • Disaster data do not change frequently, and they are the same
    for all users in the country and language
    • Cache is updated asynchronously every fixed time
    Safety Check Server
    1. request
    2. response
    Server cache Safety Check DB
    Async update

    View Slide

  25. Client-side Cache
    • iOS/Android keep their local DB for disaster cases in the
    previous request
    • Banner will keep showing for some users when notification is
    delayed or lost
    • Strategies of disaster API
    • Client will request API (getDisasterCases) when user
    access end page of safety check
    • Client will request API when TTL expires

    View Slide

  26. Agenda
    - About Safety Check
    - Disaster management and fast notification
    - Event-driven architecture for updates
    - Load testing
    - Summary

    View Slide

  27. Event-driven architecture for
    updates

    View Slide

  28. Publish/Subscribe Model
    Producer Consumer
    Broker (Kafka)
    write read

    View Slide

  29. Decaton
    • Stream task processing framework built on top of Kafka developed by LINE.
    • Design goals included enabling concurrent processing of records from a single partition.
    • https://github.com/line/decaton

    View Slide

  30. Safety Check Update Architecture
    Safety Check
    Server
    A
    P
    I
    G
    a
    t
    e
    w
    a
    y
    Safety Check
    Decaton
    Record Format:
    {“sourceUser”, “targetUser”,
    ”disasterId”,”status”,”message”}
    Local Db
    updateSafetySatus
    Thrift
    User A
    User C
    User B
    friends
    Notification
    Service
    produce consume
    produce
    consume

    View Slide

  31. Load Testing

    View Slide

  32. Mocking Services

    View Slide

  33. Testing
    System Under Test
    (SUT)
    Service B
    Service A
    HTTP Request

    View Slide

  34. Testing
    System Under Test
    (SUT)
    Mock Server
    HTTP Request

    View Slide

  35. Benefits of Mocking HTTP Services
    • Guarantee the service latency
    • Avoid Rate Limits
    • Avoid Modifying State
    • Test Edge Cases
    • Test Error Scenarios
    • Results are deterministic
    • Useful for integration tests and load tests

    View Slide

  36. Spring Cloud Contract
    • Spring Cloud Contract is a tool that enables Consumer Driven Contract (CDC)
    development.
    • It consists of:
    • Spring Cloud Contract Verifier Plugin (Gradle and Maven)
    • Spring Cloud Contract Stub Runner (Mock Server)

    View Slide

  37. Contract Project Structure

    View Slide

  38. Contract Example

    View Slide

  39. Project’s build.gradle

    View Slide

  40. Mocking HTTP Services Overview
    Define Contracts
    (Kotlin, Java, YML)
    Publish Stubs
    Artifact (JAR)
    Nexus
    Define num.
    of replicas
    Configure
    Service
    Specify
    Parameters:
    stubs, server port
    K8s Stub Runner Resource Definition
    Stub Runner
    (Mock Server)
    Fetches Stubs

    View Slide

  41. Kubernetes Resources
    Deployment Service

    View Slide

  42. Load Testing Tool

    View Slide

  43. Load Testing with Ayaperf
    • A LINE developed tool for distributed load testing based on Locust.
    • Allows you define your tests using Java.
    • Command tool to set up the secondary nodes (workers) that will generate the load for the
    server under test using Kubernetes.
    • Metrics in Grafana.

    View Slide

  44. Load Testing (Ayaperf)
    Locust
    Master
    Worker 1
    Java Client
    Worker 2
    Java Client
    Worker 3
    Java Client
    Kubernetes Cluster
    Application (SUT)

    View Slide

  45. Load Test Definition

    View Slide

  46. Load Test Definition

    View Slide

  47. Load Test Architecture
    Locust
    Master
    Worker 1
    Java Client
    Worker 2
    Java Client
    Worker 3
    Java Client
    Kubernetes Cluster Kubernetes Cluster
    Safety Check Server Deployment
    Pod 1 Pod 2 Pod N
    L
    o
    a
    d
    B
    a
    l
    a
    n
    c
    e
    r
    Ayaperf
    Spring Cloud Contract
    (Wiremock) Deployment
    Pod 1 Pod 2 Pod N
    Service

    View Slide

  48. Load Testing Results

    View Slide

  49. Estimation

    View Slide

  50. Safety Check Server updateSafetyStatus
    • Num of VM: 1
    • Spec: 8vCPU 16GB RAM

    View Slide

  51. Safety Check Server getDisastersCases
    • Num of VM: 1
    • Spec: 8vCPU 16GB RAM

    View Slide

  52. Load Test both APIs
    • RPS: 1,600
    • RPS per API
    • updateSafetyStatus
    • 47.61 %
    • 761 (1600 * 0.4761)
    • getDisasterCases:
    • 52.38 %
    • 838 (1600 * 0.5238)

    View Slide

  53. Load Test both APIs
    Given that a single server is able to handle about 1,600 RPS
    with about 30 % CPU then 14 servers were enough to support
    our initial estimation of 22,000 RPS.

    View Slide

  54. Summary

    View Slide

  55. Summary
    • Robust CMS allows easy configuration and fast distribution.
    • Cache strategies increase traffic tolerance and keep client up-to-date.
    • Event-Driven Architecture decouples your microservices.
    • Decaton allows to achive higher throughput with small number of
    partitions.
    • Mocking Server helps you control the test scenarios and the service
    latency.
    • Load testing allows to measure application throughput and resource
    utilization.

    View Slide