LINE Shopチームのあけおめ対応 / LINE Shop team's New Year Work

LINE Shopチームのあけおめ対応 / LINE Shop team's New Year Work

2020/5/22に行われたLINE Developer Meetup #63 での登壇資料です
https://line.connpass.com/event/174730/

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers

May 22, 2020
Tweet

Transcript

  1. LINE Shop team’s New Year work 2020/05/22 ʲOnlineʳLINE Developer Meetup

    #63 ( https://line.connpass.com/event/174730/ ) LINE Fukuoka Corp, Development 1 Dept. Manabu Matsuzaki
  2. About me @matsumana LINE Fukuoka Corp, Development 1 Dept. SRE/Server

    side Engineer https://github.com/matsumana Manabu Matsuzaki
  3. • Introduction to LINE Shop team • LINE Shop service

    architecture • LINE Shop’s Network traffic on New Year • Preparation for New Year • Result of the New Year's day • Improvements for next year Agenda
  4. Introduction to LINE Shop team

  5. What is LINE Shop team? •We develop Sticker/Emoji/Theme features (buy,

    use, etc) for LINE app and Web • Sticker Shop and Theme Shop in LINE app • Web store named LINE STORE (https://store.line.me/) • LINE wallet in LINE app’s Wallet tab (Not Sticker/Emoji/Theme, though)
  6. What is LINE Shop team? •Team members (server-side engineers) •

    Tokyo: about 15 • Fukuoka: about 10
  7. What is LINE Shop team? •We collaborate with other teams

    • Native app engineers (iOS, Android) • Frontend engineers (called UIT) • DBA • In-house platform engineers
 (Monitoring, Private Cloud, Data Analysis, etc)
  8. How big LINE Shop’s service scale? •Number of Stickers *1

    • 8,550,000 LINE Sticker sets are on sale (As of Mar 2020) • 433,000,000 stickers sent per day on average (As of April 2019) •RPS(requests/sec) • Usual: ~ 50K RPS • Event/New Year: ~ 100K RPS *1
 en: https://linecorp.com/en/pr/news/en/2020/3157 ja:https://linecorp.com/ja/pr/news/ja/2020/3127
  9. Custom Stickers • Short text customizable Stickers • en: https://linecorp.com/ja/pr/news/en/2019/2666

    • ja: https://linecorp.com/ja/pr/news/ja/2019/2664
  10. Stickers Premium • Get unlimited access to over 3,000,000 creators

    stickers • en: https://store.line.me/stickers-premium/landing/en • ja: https://store.line.me/stickers-premium/landing/ja
 http://creator-mag.line.me/ja/archives/1075007192.html
  11. Message Stickers • Users can add personalized messages • en:

    https://linecorp.com/en/pr/news/en/2020/3157 • ja: https://linecorp.com/ja/pr/news/ja/2020/3127
  12. LINE Shop service architecture

  13. LEGY
 (LINE Event Delivery Gateway) LINE App talk-server Web Browser

    LINE STORE
 (https://store.line.me) shop-proxy
 (Shop API gateway) shop-server stickershop-server search-fe ownership-server capability-server LINE Shop Overview (include LEGY and talk-server) CWA
 (Web App in LINE)
  14. Overview (inside LINE Shop) LINE STORE
 (https://store.line.me) shop-proxy shop-server stickershop-server

    search-fe ownership-server capability-server Product Ownership Used as cache
  15. •LINE Store: Web store •shop-proxy: Shop API gateway •shop-server: Coordinate

    requests •search-fe: Search frontend •ownership-server: Products(sticker, emoji, theme) ownership •capability-server: capability check Overview (inside LINE Shop)
  16. Overview (Framework, monitoring) Microservice A Microservice B Microservice C Tracing

    Logging Metrics IMON
 (in-house monitoring system) •Armeria (Microservice Framework written in Java) • REST/Thrift/gRPC, HTTP2, Service Discovery, Circuit Breaker, etc
  17. What do we monitor? •Prometheus+Grafana or IMON • Service latency

    (50th, 90th, 99th percentile, etc) • Amount of logs (Warn, Error) • JVM (GC, Heap, etc) • Server load (CPU, Memory, Network Traffic, etc) • etc…
  18. What do we monitor? •Elasticsearch+Kibana • Raw log •Zipkin •

    Identify performance bottlenecks in microservices
  19. LINE Shop’s Network traffic on New Year

  20. A bunch of requests will come than usual •New Year

    campaign • ྩ࿨ॳͷ͓ਖ਼݄͸ɺLINEͷʮ͓೥ۄ೥լελϯϓʯͰ৽೥ͷѫࡰΛɻ େਓؾΩϟϥΫλʔͳͲɺ800छྨҎ্ͷελϯϓɾֆจࣈ͕ొ৔ʂ
 https://linecorp.com/ja/pr/news/ja/2019/3038
  21. A bunch of requests will come than usual •Messages from

    LINE Sticker official account • As of Feb 2020, the account has 57,000,000 friends • Increasing continuously (YoY: x1.5) • The account sends messages to friends regularly
 
 
 
 
 
 

  22. Summary so far •LINE Shop team develops Sticker/Emoji/Theme features •Service

    scale • 8,550,000 LINE Sticker sets are on sale (As of Mar 2020) • 433,000,000 stickers sent per day on average (As of April 2019) • RPS • Usual: ~ 50K RPS • Event/New Year: ~ 100K RPS •Introduced LINE Shop service architecture
  23. Summary so far • New Year is one of the

    highest traffic event for our service • New Year greetings • New Year campaign • Messages from LINE Sticker official account
  24. Preparation for New Year

  25. Preparation for New Year • Measure capacity for each microservice

    • Estimate number of servers and spec based on measured capacity and past events • Last New Year • Other high traffic events
  26. Measure capacity for each microservice • Measured using production traffic

    • Service out existing servers little by little while monitoring
  27. Measure capacity for each microservice •Measured using production traffic •

    Service out existing servers little by little while monitoring • RPS per one server • API latency
  28. Measure capacity for each microservice •Errors/sec •CPU usage •Network traffic

    •JVM GC •etc…
  29. Measure capacity for each microservice •Improvement ideas for capacity measurement

    • Control traffic ratio for one server
 (e.g. Weighted Round Robin) • Continuous load test (like a CI) • etc
  30. Estimation based on past event’s metrics •Determine server spec and

    number of servers • Based on API latency, server’s load, etc • e.g. • Supposed total RPS: 10K • Current capacity per one app server: 1K • As the result: 10K / 1K = Need 10 servers
  31. Estimation based on past event’s metrics •Need to care DB

    server’s load and setting as well • If we add many app servers, DB servers may become bottle neck • Connection error might occur due to DB’s max connections
  32. Estimation based on past event’s metrics •If necessary, replace existing

    app servers with higher spec servers • Advantages: • Reduce DB connections • Make faster server provisioning (we use Ansible) • Reduce server operation cost
  33. Result of the New Year's day

  34. RPS (right after 2020/01/01 12:00am)

  35. RPS (right after 2020/01/01 12:00am) •In several microservices, RPS was

    higher than last year •YoY x1.7 ~ x4.9 on each microservice • shop-proxy: 124.1k RPS (YoY x1.7) • search-fe: 142.0k RPS (YoY x4.9)
  36. Service failure occurred •Unfortunately, service failure occurred right after Jan

    1st 12:00am •Users couldn’t edit text, send and receive Custom Stickers •Sorry for inconvenience
  37. Service failure occurred • Due to lack of server resources

    • Custom Stickers’ requests higher than our expectation • There was no data for 2018 • Custom Stickers has been released in 2019 • Added app servers for Custom Stickers urgently • However, it took long time due to server provisioning
  38. Service failure occurred • Fortunately the failure didn’t affect whole

    of our service • Affected only Custom Sticker • This is one of the benefit of microservices • Well designed microservices can mitigate impact
  39. Improvements for next year

  40. Postmortem Culture in LINE • If service failure occur, we

    hold a postmortem meeting • Discuss with related teams to prevent future failures
  41. Postmortem Culture in LINE • Items of postmortem report •

    Failure detection time • Affected services • Cause • Timeline • How to prevent future failures • How to improve failure detection • How to improve failure handling
  42. Improvements for this service failure •Implement rate limiter(throttling) for Custom

    Sticker server • Rate limiter can reduce server load •Make server scale-out faster • More automation • Make Ansible execution duration faster • Consider to migrate to k8s
  43. Conclusion

  44. Conclusion •New Year is one of the highest traffic event

    for our service •Preparation for New Year • Capacity measurement • Consider scale-up or scale-out based on measured capacity and past events
  45. Conclusion • In several microservices, RPS was higher than last

    year • The service failure occurred in part of our service • However, thanks to microservices, the failure didn't affect to whole service • We organize postmortem meeting with related teams
  46. Appendix

  47. Related LINE Engineering Blog article (2020 New Year campaign) •High-throughput

    distributed rate limiter
 https://engineering.linecorp.com/en/blog/high-throughput- distributed-rate-limiter/
  48. Related sessions at LINE DEV DAY 2019 (Custom Stickers) •How

    Custom Sticker Development Works
 https://linedevday.linecorp.com/jp/2019/sessions/S1-02 •Custom Elements in Custom Stickers
 https://linedevday.linecorp.com/jp/2019/sessions/S2-10
  49. • Long road to microservices architecture at LINE messaging platform


    https://linedevday.linecorp.com/jp/2019/sessions/D1-6 • Armeria: A Microservice Framework Well-suited Everywhere
 https://linedevday.linecorp.com/jp/2019/sessions/D2-2 Related sessions at LINE DEV DAY 2019 (Architecture)
  50. •LINE's New Year campaign - Taking Control of High Traffic


    https://linedevday.linecorp.com/jp/2019/sessions/S1-09 Related sessions at LINE DEV DAY 2019 (2019 New Year campaign)
  51. Thank you for paying attention