LINE LIVE: Features & optimizations to support artists & fans

Agenda › Overview of LINE LIVE › 2 Challenge in
2020 › New Features › LINE Face2Face › LINE LIVE-VIEWING › Load Testing for Performance Visualization

› Video streaming and hosting service › Streams from celebrities,
companies and other users › Chat communication LINE LIVE

Traffic Characteristics › Spike access by popular broadcast 400000 18:50
18:58 <- 32K RPS <- 3K RPS

Object Storage Media Servers Host CDN LINE Talk Server Billing
CMS API Server LINE LIVE Architecture RTMP Chat Server JSON API Upload HLS files Cache WebSocket Fetch .m3u8 and .ts LINE LIVE Player

Object Storage Media Servers Host CDN LINE Talk Server LINE
LIVE Player Billing CMS API Server LINE LIVE Architecture Chat Server JSON API Upload HLS files Cache WebSocket Fetch .m3u8 and .ts RTMP

CMS API Server LINE LIVE Architecture Chat Server Upload HLS files Cache Fetch .m3u8 and .ts JSON API WebSocket LINE LIVE Player RTMP

CMS API Server LINE LIVE Architecture Chat Server JSON API Upload HLS files Cache LINE App WebSocket Fetch .m3u8 and .ts RTMP

CMS LINE LIVE Architecture RTMP Chat Server Upload HLS files Cache WebSocket Fetch .m3u8 and .ts LINE LIVE Player API Service API Servers Redis MySQL Kafka Consumer Servers JSON API

History of LINE LIVE 2020 › Direct messaging › LINE
LIVE-VIEWING › Face2Face 2019 › Paid broadcast › Subscription channel 2017 › Twitter login › Design renewal 2015 Dec. › Service release 2016 › Gift item sending › Broadcasts from end users 2018 › Collaboration broadcast

2 Challenges in COVID-19 Situation New Features Development Performance Improvement

New Features and Behind the Scenes LINE Face2Face LINE LIVE-VIEWING

› “Akushu-kai” on online › 1 on 1 video communication
to celebrities › Pre-distributed codes are needed to use. LINE Face2Face

Overview of Face2Face Idol Fan Queue Fan 15 s 30s
45s 15s Pop queue › Fans can talk to their idols during their own time › Fans wait in the queue › Video call is implemented by using RTMP Video Call

Video Call via RTMP › RTMP (Real Time Messaging Protocol)
› On TCP › Use 2 RTMP streams Fan Idol Media Servers Upstream with RTMP URL for Idol Playback with RTMP URL for Fan Playback with RTMP URL for Idol Upstream with RTMP URL For Fan Give RTMP URLs Give RTMP URLs

Backend of Face2Face 1. Enter to Fan-Queue (after token and
other auth) 3. RTMP Resource Allocation & Sharing 4. Start Face2Face & Finish 2. Pop a fan from queue Flow of Face2Face

1. Enter to Fan-Queue (after token and other auth) Fan
Idol Chat Servers Batch Servers API Servers Media Servers DB 2. Verify tickets 1. Submit tickets 3. Push to “Fan Queue” Flow of Face2Face Backend of Face2Face

Idol Chat Servers Batch Servers API Servers Media Servers DB Flow of Face2Face Backend of Face2Face 4. Pop a fan from Fan Queue 2. Pop a fan from queue

Idol Chat Servers Batch Servers API Servers Media Servers DB Flow of Face2Face Backend of Face2Face 2. Pop a fan from queue 5. Acquire RTMP stream for Idol and Fan 6. Save fan and stream info 7. Notify collaboration start and give RTMP stream url of Idol 3. RTMP Resource Allocation & Sharing

Idol Chat Servers Batch Servers API Servers Media Servers DB Flow of Face2Face Backend of Face2Face 2. Pop a fan from queue 3. RTMP Resource Allocation & Sharing 4. Start Face2Face & Finish Loop until queue size is 0 8. Stop Face2Face 9. Re-pop a fan from Fan Queue

Development of Face2Face › Utilizing existing function › “Collaboration Broadcast”
› “Task-force style” development › Developers join planning more aggressively than usual

› Online ticket-based streaming › Users can watch a broadcast
by pre- purchasing an online ticket LINE LIVE-VIEWING

Overview of LINE LIVE-VIEWING API Servers 1. Buy the ticket
MySQL 2. Authorize the user Ticket sales site 3. Request HLS files 4. Check permission › The LINE user account is authorized by buying a ticket › User can watch the video via LINE, LINE LIVE App and LINE LIVE Web LINE LIVE App

Traffic Spike Problem in LINE LIVE-VIEWING › In ticket sales
site › Spike traffic is come after sales start because of limitation of num of tickets › In broadcast playback › Spike traffic is come after broadcast start

High Volume Traffic Handling in Ticket Site › Caching with
Redis › In-memory key-value data store › RARELY not touch MySQL for read during users’ purchase flow Home Detail › Referred to ISUCON8

Cache for “Home” Batch Fetch live list from MySQL per
1min API Cache as “Sorted List”

Leave footprint of the user Check permission High Volume Traffic
Handling in broadcast playback › Caching and async write › Cache all authorized user IDs on Redis before broadcast finished › Update ticket status after broadcast finished API Get broadcast url Batch Fetch footprint Update status After broadcast finishing › Not use Kafka because a lot of kafka job will created

Performance Testing of Ticket Purchase Flow › Performed ticket purchase
spike test › Scenario-based test /home => /detail => /reserve => /purchase API Server GET /home GET /detail POST /reserve › Can handle over 200 purchase / second per 1 server

Load Testing for Performance Visualization

Phase Change of LINE LIVE › New feature: LINE LIVE-VIEWING
› “Pre-purchase” model › Traffic Increase › Service growth and COVID-19 situation › Big Incident › Because of technical debt and lack of performance measurement Increasing demand for high reliability

Incident in early 2020 › 1 hour service down time
› Because of MySQL connection handling problem + Redis network bandwidth limitation Not only normal testing, but also load testing is needed. API Server Connection Pool for MySQL Don’t return until API response returned Get connection Blocked to get cache because of limitation of network bandwidth

Why Load Testing › Goal Make the system handle large-scale
broadcast stably › Process to achieve the goal Analyze Results Find Defects Fix Defects Large Workload

Why Load Testing › Goal Make the system handle large-scale
broadcast stably Daily Access Irregular Spike Access Artificial Spike by Load Testing › Insufficient load › Long interval › Sufficient load › No interval

Requirements for Load Testing Repeatability - easy to re-execute tests
by everyone in both of same and different conditions Edit-ability - easy to edit, create and share scenarios Understandability - easy to understand the result right after the test execution

by everyone in both of same and different conditions Edit-ability - easy to edit and create scenarios Understandability - easy to understand the result right after the test execution

by everyone in both of same and different conditions Understandability - easy to understand the result right after the test execution Edit-ability - easy to edit and create scenarios

by everyone in both of same and different conditions Edit-ability - easy to edit and create scenarios Understandability - easy to understand the result right after the test execution

› Engineers can execute test via slack › Engineers can
specify “load variable” like › simultaneous viewers num “.stress” for Load Testing Repeatability

› The result is notified to Slack › consists of
3 type of info Result Notification Understandability

› Abstract of the result is notified to Slack Result
Notification: Binary Result Understandability Test Passed Test Failed

› Defined as › All HTTP status code is 200
› 99 percentile of API response time is lower than threshold What is ‘Passed’? Understandability

› Client-side metrics are notified › RPS › Response time
of each API Result Notification: Client-side Metrics Understandability

› Server-side metrics are notified as Grafana dashboard › Loadavg
› CPU usage › Active MySQL connection count › etc. Result Notification: Sever-side Metrics Understandability

› Only push to scenario repository Senario Updating Edit-ability sync

Open Source Load Testing Engine: k6 › CLI based open
source load testing tool › Scripting scenarios in JavaScript ES2015/ES6 https://k6.io/

Load Testing Architecture Slack Bot API Lode Test Manager GitHub
Sync scenario files Load Test Node Load Test Node Load Test Node Target Servers with exporters .stress and scenario ID Alert Manager Detect dashboard urls and send images Datasource

2 Type Load Testing › Single API Test › To
visualize single API’s performance › Not all API, but frequently called API › Scenario based Test › To visualize system performance against the particular scenario › ex. LIVE-VIEWING broadcast playback spike from App users Broadcast Detail API Authentication API Channel API Broadcast Detail API …

Test Environment Real Env >200 Load Test Env 10% of
real › 10% servers of each components › 100% is costly › 1 instance can’t place a stress on MySQL and Redis enough

Example of performance degradation detection › N+1 query executions in
frequently called API Load Test Result Detected defect (psuedo code)

Example of performance degradation detection › MySQL connection handling problem
AGAIN

Summary › New Features: LINE Face2Face and LINE LIVE-VIEWING ›
Load Testing in LINE LIVE › Repeatability, Understandability and Edit-ability

Thank you!

LINE LIVE: Features & optimizations to support ...

LINE LIVE: Features & optimizations to support artists & fans

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript