Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE MediaStorage at scale

LINE MediaStorage at scale

LINE DevDay 2020

November 25, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1. Agenda › Introduction to the LINE MediaPlatform › MediaStorage Forward

    Placement › Needs and Challenges › Benefits › Future works
  2. MediaPlatform OBS LIVE Vision-AI Delivery Processing Storage Live Streaming VOD

    OCR Adult Filter Object, QR Detection Functions Chat (OBject Storage)
  3. MediaPlatform OBS LIVE Vision-AI Delivery Processing Storage Live Streaming VOD

    OCR Adult Filter Object, QR Detection Functions Chat (OBject Storage)
  4. MediaPlatform OBS(OBject Storage) in numbers: Traffic, Storage Volume Traffic /

    Day 500Gbps+ Storage Volume 60PB+ Requests / Day 7B+ 2020. 1Q
  5. MediaPlatform OBS(OBject Storage) in numbers: Traffic of NewYear! Japan NewYear!

    (GMT +9) Taiwan NewYear! (GMT +8) 2x than usual 2.7x than usual
  6. MediaPlatform OBS(OBject Storage) in numbers: Servers OBS Delivery Processing Storage

    2013 2020 (270EA+) 3PB 60PB (2900EA+) (x20) (x10.7) (+7) (200EA+) 10M+ requests/day 7B+ requests/day (750EA+) (x70) (x3.75) 340EA+ 250EA+ (x0.73)
  7. Storage Forward Placement OBS(OBject Storage) PoP(Point of Presence) ServiceBO Cache

    Auth Storage Processor ServiceBO Cache ServiceBO Cache ServiceBO Cache ServiceBO Cache Japan Korea Singapore US Germany Japan
  8. Private Network Storage Forward Placement Needs #1: Reduce response speed

    differences between users ServiceBO Users Storage Cache
  9. Private Network Storage Forward Placement Needs #1: Reduce response speed

    differences between users ServiceBO Users Storage Cache (1) Download
  10. Private Network Storage Forward Placement Needs #1: Reduce response speed

    differences between users ServiceBO Users Storage Cache (1) Download (2) Is the object cached?
  11. Private Network Storage Forward Placement Needs #1: Reduce response speed

    differences between users ServiceBO Users Storage Cache (1) Download (2) Is the object cached? (3-1) If Yes: The cache server responds to the cached object.
  12. Storage Forward Placement Needs #1: Reduce response speed differences between

    users ServiceBO Users Storage Cache (1) Download (2) Is the object cached? Private Network
  13. Storage Forward Placement Needs #1: Reduce response speed differences between

    users ServiceBO Users Storage Cache (1) Download (2) Is the object cached? (3-2) If No: The cache server responds after pulling the object from the storage. Cached! Private Network
  14. Storage Forward Placement Needs #1: Reduce response speed differences between

    users ServiceBO Users Storage Cache (1) Download (2) Is the object cached? Good (3-2) If No: The cache server responds after pulling the object from the storage. Cached! Private Network
  15. Storage Forward Placement Needs #1: Reduce response speed differences between

    users Cache 1:N Chat Users Cache Hit 1:1 1:N Chatroom <
  16. Storage Forward Placement Needs #1: Reduce response speed differences between

    users Cache Storage 1:1 Chat Users Good 1:N Chat Users Cache Hit 1:1 1:N Chatroom <
  17. Storage Forward Placement Needs #1: Reduce response speed differences between

    users Cache 1:N Chat Users B, C, D … Cache Hit 1:N Chat User A Storage Good Cache Miss
  18. Storage Forward Placement Needs #1: Reduce response speed differences between

    users Cache Storage Singapore Users Singapore Users Singapore Japan ? ?
  19. Storage Forward Placement Needs #1: Reduce response speed differences between

    users Cache Storage Singapore Users Singapore Users Singapore Singapore
  20. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic Leased Line

    Cache Storage HitRatio: 70% Cache Miss: 30% TH Users
  21. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic Leased Line

    Cache Storage HitRatio: 70% Cache Miss: 30% TH Users
  22. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic - by

    Example Cache Storage TH Users HitRatio: 70% Cache Miss(30%) 10G
  23. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic - by

    Example Cache Storage TH Users HitRatio: 70% Cache Miss(30%) 10G 50G 10G 20G 30G 30G 30G 10G 15G 5G 10G 20G
  24. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic - by

    Example Cache Storage TH Users HitRatio: 70% Cache Miss(30%) 30G 50G 10G 20G 30G 30G 30G 10G 15G 5G 10G 20G
  25. Storage Forward Placement Needs #2: Saving Inter-IDC Traffic Storage TH

    Users Cache HitRatio: 70% Cache Miss: 30% Storage Leased Line
  26. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage
  27. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage
  28. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage
  29. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage TW Users JP Users
  30. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Cache Hit Ratio Leased Line Traffic(SG-JP)
  31. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage
  32. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage
  33. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage TW Users JP Users Storage
  34. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage TW Users JP Users Storage
  35. Storage Forward Placement Needs #3: Minimize Cache and Leased Line

    Dependencies Leased Line TH Users Cache Storage TW Users JP Users Storage
  36. Storage Forward Placement Needs #4: Active-Active Clustered Storage TH Users

    Cache Storage Japan#1 ObjectId Region A Japan#1 B Japan#2 C Singapore Region Mapping Table Active DNS Update Japan#2
  37. Storage Forward Placement Needs #4: Active-Active Clustered Storage TH Users

    Cache Storage Japan#1 ObjectId Region A Japan#1 B Japan#2 C Singapore Region Mapping Table 35% 50% 15% Active DNS Update Japan#2
  38. MediaStorage MediaStorage Requirements › Resiliency › Scalability › Consistency ›

    Fault-Tolerant OBS Delivery Processing Storage › Cost-Efficiency
  39. MediaStorage Ceph, Which one to use? Object Storage (Rados Gateway)

    Object: Key-Value mappings Amazon S3 Restful API supported. Graceful shutdown File Storage (CephFS) File: FileSystem FUSE API supported. MDS required. Block Storage Block: Sequence of bytes It provides a block device image
  40. MediaStorage Ceph Internals OBS FUSE REST API Ceph Cluster Key-Value

    DB Data MDS MetaPool DataPool RGW IndexPool DataPool POSIX Amazon S3 OSD Object Storage File Storage
  41. MediaStorage Ceph Internals OBS FUSE REST API Ceph Cluster Key-Value

    DB Data MDS MetaPool DataPool RGW IndexPool DataPool POSIX Amazon S3 OSD Object Storage File Storage
  42. MediaStorage Why Ceph Object Storage #1: S3 Compatibility RESTful API

    S3 API OBS OSD ServiceBO Rados Gateway Ceph RADOS
  43. MediaStorage Why Ceph Object Storage #1: S3 Compatibility RESTful API

    S3 API OBS S3 API OSD ServiceBO Rados Gateway Ceph RADOS
  44. MediaStorage Why Ceph Object Storage #2: Reusing common functions between

    OBS and Ceph Object Storage OSD Ceph RADOS Storage Dedicated Protocol Bucket-id Resolving Algorigthm Storage I/O Thread Management Storage Class Transition Object Expiration … RPC OBS ServiceBO Commercial Storage Rados Gateway
  45. MediaStorage Why Ceph Object Storage #3: Storage Class Transition High

    Performance Storage Standard Storage High-Density Storage IOPS, Cost Capacity MediaStorage Users After 7 Days After 14 Days
  46. MediaStorage Why Ceph Object Storage #3: Storage Class Transition High

    Performance Storage Standard Storage High-Density Storage IOPS, Cost Capacity MediaStorage Users After 7 Days After 14 Days
  47. MediaStorage Why Ceph Object Storage #4: CephFS MDS Active Promotion

    Active Standby MDS(Meta Data Server) Active Sync the active MDS states
  48. MediaStorage Why Ceph Object Storage #4: CephFS MDS Active Promotion

    Active Standby MDS(Meta Data Server) Active Sync the active MDS states (journal replaying: takes 3~4 seconds) Service BO read() write() read() write() read() write() Blocked!
  49. MediaStorage Architectural concerns on the OBS side › Object Immutability

    › Which services will be moved to Ceph? › Storage I/O Adapter Library OBS Delivery Processing Storage
  50. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O
  51. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK RPC HTTP Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O
  52. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK RPC HTTP Blocking I/O NonBlocking I/O Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O
  53. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK RPC HTTP Blocking I/O NonBlocking I/O PUSH PULL Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O
  54. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK RPC HTTP Blocking I/O NonBlocking I/O PUSH PULL Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O
  55. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Commercial Storage Ceph

    Storage Library AWS S3 SDK RPC HTTP Blocking I/O NonBlocking I/O PUSH PULL Business Functions Common Functions Storage / Traffic Cost Optimization Storage I/O Video Processing Server Image Processing Server
  56. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Business Functions Common

    Functions Storage / Traffic Cost Optimization Storage Abstraction Asynchronous Interface Storage I/O Adapter
  57. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Business Functions Common

    Functions Storage / Traffic Cost Optimization Storage Abstraction Asynchronous Interface Storage I/O Adapter Commercial Storage Ceph RPC HTTP Blocking I/O NonBlocking I/O PUSH PULL Storage I/O Adapter Publisher Queue Subscriber request(n) Library AWS S3 Subscription onNext(data)
  58. MediaStorage Enhanced Storage I/O Layer Design ServiceBO Business Functions Common

    Functions Storage / Traffic Cost Optimization Storage Abstraction Asynchronous Interface Storage I/O Adapter Commercial Storage Ceph RPC HTTP Blocking I/O NonBlocking I/O PUSH PULL Storage I/O Adapter Publisher Queue Subscriber request(n) Library AWS S3 Subscription onNext(data) Image Processing Server Video Processing Server Storage I/O Adapter
  59. MediaStorage Enhanced Storage I/O Layer Design copy() OBS ServiceBO Commercial

    Storage Friend A Copy Friend B Copy Friend C Copy
  60. MediaStorage Enhanced Storage I/O Layer Design copy() OBS ServiceBO Commercial

    Storage Friend A Copy Friend B Copy Friend C Copy Ceph
  61. MediaStorage Enhanced Storage I/O Layer Design Commercial Storage Ceph Object

    Storage request(n) ServiceBO Storage Abstraction Asynchronous Interface Storage I/O Adapter CommercialPublisher CephSubscriber onNext(data) read() write()
  62. MediaStorage Enhanced Storage I/O Layer Design Commercial Storage Ceph Object

    Storage request(n) ServiceBO Storage Abstraction Asynchronous Interface Storage I/O Adapter CommercialPublisher CephSubscriber onNext(data) read() write() 100Mbps 50Mbps
  63. MediaStorage Resumable Upload - Multipart PUT User Packets OBS(Service BO)

    Ceph Object Storage Part #1 Part #2 Part #3 Part #4 offset 0 64K 128K 192K 256K Aggregation & Make each part Object #1 Object #2 Object #3 Object #4
  64. MediaStorage Resumable Upload - Multipart PUT User Packets OBS(Service BO)

    Ceph Object Storage Part #1 Part #2 Part #3 Part #4 offset 0 64K 128K 192K 256K Aggregation & Make each part Object #1 Object #2 Object #3 Object #4 Complete
  65. MediaStorage Resumable Upload - Multipart PUT RTT OBS(Service BO) Ceph

    Object Storage Initiate Multipart Upload Request Accept request and returns UploadID Upload Parts Complete Multipart Upload Request Make Object RTT = Parts + 2 ?
  66. MediaStorage Resumable Upload - Improved Multipart PUT RTT OBS(Service BO)

    Ceph Object Storage Create UploadID and Upload with the 1st part Upload Parts(2nd~last) Complete Multipart Upload Request Make Object RTT = Parts + 1
  67. MediaStorage Which services will be moved to Ceph? Essential services

    even in a disaster situation With limited storage period Should have the high localities
  68. Storage Forward Placement Future works API / Protocol Enhancement Find

    more Regions & Services Change the Message MediaStorage