Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerating Video Segment Access via Quality-A...

Avatar for Dominik Winecki Dominik Winecki
April 09, 2025
5

Accelerating Video Segment Access via Quality-Aware Multi-Source Selection

Avatar for Dominik Winecki

Dominik Winecki

April 09, 2025
Tweet

Transcript

  1. Accelerating Video Segment Access via Quality-Aware Multi-Source Selection Dominik Winecki

    (The Ohio State University, USA) Arnab Nandi (The Ohio State University, USA)
  2. Background: Managing the Quality / Cost Tradeoff 2 Adaptive Bitrate

    Streaming • Multiple sources used to control bandwidth usage Optimized & Proxy Media • Efficient-to-edit (e.g., ALL-I) sources used to speed up editing at the cost of storage
  3. Use Case: Accessing and Editing Video from VOD/ABR Sources 3

    Consider a video streaming service with a search engine that generates video compilations in response to searches • Many short segments need to be decoded quickly • Must use existing VOD/ABR sources from the CDN Which video source do you use? “Lowest bitrate 720p H.264 source”
  4. The Problem with Simple Source Selection 4 “Just use the

    lowest bitrate 720p H.264 source” • Crude enforcement of a quality constraint • Likely the fastest source to access, but not always! • Sometimes a higher bitrate/resolution source is faster to access if it has GOPs better aligned with the required segment • Sometimes jumping between multiple sources is faster
  5. VOD/ABR Video Sources 200 400 600 Frame Number Source 5

    200 400 600 Frame Number Source 0 25 50 75 100 Quality (VMAF) 200 400 600 Frame Number Source 1 2 Decode Time (s) Quality (VMAF) Decode Time (s) GOP Layout (index)
  6. Problem Statement 6 Given a set of existing video sources

    and a required segment range, find the most efficient access plan to decode the needed segment. “Use any/all sources, ensure VMAF > 85, minimize latency”
  7. MSSOpt 7 1) Identify all source GOPs in range 2)

    Remove low-quality GOPs 3) Construct a frame transition graph 4) Find the shortest path 5) Use the shortest path as the decode plan 0 1 2 3 4 A B
  8. MSSOpt – Runtime 8 𝑂 𝑆 $ 𝑅 + log

    𝑁 $ 𝑀! 𝑆 – Number of sources 𝑅 – Length of needed range (frames) 𝑁 – Length of source video (frames) 𝑀 – Max length of a GOP
  9. MSSOpt – Graph Pruning 9 200 400 600 Frame Number

    Source Most frames can never be transition points! Most edges are never in the shortest path!
  10. MSSOpt – Graph Pruning 10 Start F0 B1 A1 F1

    A1 F2 A1 F3 A1 B2 A1 B2 A1 B2 A1 F4 B2 B2 A1 B2 A1 B2 B2 A1 B2 B2 A2 Only five conditions when frame needs considering: • The frame before the start of the range • The last frame of the range • The frame prior to the start of a GOP • The start of a GOP • The end of a GOP All other frames are omitted when constructing the frame transition graph.
  11. MSSOpt – Limitations • MSSOpt assumes decoding a GOP is

    an atomic operation • MSSOpt uses per-GOP quality statistics • Using multiple sources results in scattered random I/O • However, this is considered by the optimizer via the cost heuristic 11
  12. Evaluation Dataset: • 100 documentaries from PBS’s FRONTLINE & NOVA

    series • All sources downloaded from YouTube (via yt-dlp) • Encoded streams kept unmodified • Each video around 55 minutes @ 29.97 FPS • 256x144 to 1920x1080 • Average of 29.8 sources per video, almost all H.264 or VP8 Metric: • Comparing Multi-Source to optimal Single-Source decode latency 12
  13. Results – Segment Decoding 0 1 2 0 25 50

    75 Quality (VMAF) Decode Time (s) Method Multi−Source Single−Source 13 As quality constraints are lowered decode time decreases Using multiple sources results in lower latency than using the optimal single source
  14. Results – Segment Length 14 0% 50% 100% 150% 1

    2 5 10 15 30 60 Segment Length (s) Relative Decode Time (percent) Segment Length Decode Time MS ≠ SS Opt. Time 1s 98.8% 16.7% 1.00 ms 2s 98.0% 29.7% 1.19 ms 5s 94.9% 60.7% 1.38 ms 10s 90.9% 82.5% 1.59 ms 15s 87.8% 89.2% 1.73 ms 30s 83.1% 95.6% 2.03 ms 60s 76.8% 97.5% 2.97 ms Shorter decoded ranges saw less speedup as many were naturally single-source optimal MSSOpt executes in single-digit milliseconds
  15. Evaluation – Video Editing 15 MSSOpt integrated into V2V, an

    optimizing declarative video editor Codec Selection Optimized Plan Temporal Shard Filters Operator Merging Operator Merging Zone Maps Unoptimized Plan Operator Merging Smart Cut Temporal Shard Filters Operator Merging Optimized Plan Unoptimized Plan MssOpt Plan Multi-Input MssOpt Plan Quality U1U. Dezbhdmskx Rxmsgdrhyhmf Uhcdn Qdrtksr enq Uhcdn Ptdqhdr Cnlhmhj Vhmdbjh Bnlotsdq Rbhdmbd ) Dmfhmddqhmf Sgd Mghn RsXsd Tmhudqrhsx Bnktlatr: TR vhmdbjh0fl&nrt0dct qmZa MZmch Bnlotsdq Rbhdmbd ) Dmfhmddqhmf Sgd Mghn RsXsd Tmhudqrhsx Bnktlatr: TR mZmch08&nrt0dct arsqYbs-Ptdqxhmf uhcdn cWsW gWr adbnld hmbqdWrhmfkx ono, tkWq Wmc trdetk3 Uhcdn ptdqhdr bWm ad bnlokdw: qWmfhmf eqnl qdsqhduWk sWrir ?”zmc ld sgd sno uhcdnr sgYs gYud222.(: sn WmWkxs, hbr ?”gnv lYmx uhcdnr bnmsYhmdc naidbs V odq cYx“.(: sn dwbdqoshmf sWrir ?”ghfgjhfgs Ymc ynnl hmsn rbdmdr vhsg naidbs V mdYq naidbs X.(: nq bnlahmWshnmr sgdqdne3 Qdrtksr enq uhcdn ptdqhdr Wqd rshkk sxohbWkkx rgnvm Wr dhsgdq qdkWshnmWk cWsW nq W oqhlhshud bnkkdbshnm ne bkhbiWakd sgtlamWhkr nm W vda oWfd3 Oqdrdmshmf ptdqx qdrtksr hm sghr enql hr Wm hlodcWmbd lhrlWsbg vhsg sgd uhcdn ldchtl. sgdx Wqd btladqrnld sn rihl sgqntfg Wmc Wqd hm W cheedqdms lncWkhsx Wmc hmenqlWshnm cdmrhsx bnloWqdc sn sgd rntqbd cWsW3 Vd cdrbqhad TfiT: W rxrsdl sn dezbhdmskx rxmsgdrhyd uhcdn qdrtjsr enq uhcdn ptdqhdr3 UfiU qdstqmr W etkkx,dchsdc uhcdn: Wkknvhmf sgd trdq sn bnmrtld qdrtksr hm sgd rWld lWmmdq Wr sgd rntqbd uhcdnr3 idx bgWkkdmfd hr sgWs rxmsgdrhyhmf uhcdn qdrtksr eqnl W bnkkdbshnm ne uhcdnr hr bnlotsWshnmWkkx hmsdmrhud: drodbhWkkx vhsghm hmsdqWbshud ptdqx qdronmrd shldr3 Sn Wccqdrr sghr: UfiU edWstqdr W fqWllWq sn dwoqdrr uhcdn sqWmrenqlWshnmr hm W cdbkWqWshud lWmmdq Wmc W gdtqhrshb noshlhydq sgWs hloqnudr sgd dezbhdmbx ne UfiU oqnbdrrhmf hm W lWmmdq rhlhkWq sn gnv cWsWaWrdr dwdbtsd qdkWshnmWk ptdqhdr3 Dwodqhldmsr rgnv sgWs ntq UfiU noshlhydq dmWakdr uhcdn rxmsgdrhr sn qtm 2 eWrsdq3 Amcdw Idqlr-ltkshldchW cWsWaWrdr: uhcdn qdrtks rxmsgdrhr: cdbkWqWshud uhcdn dchshmf H0 HMSQNCTBSHNM Uhcdn cZsZ gZr dwodqhdmbdc Z qdlZqjZakd rtqfd: hloZbshmf lZmx Zrodbsr ne ntq cZhkx khudr0 Sgd vhcdroqdZc Zcnoshnm ne rlZqsognmdr: rnbhZk ldchZ okZsenqlr: Zmc nsgdq chfhsZk cduhbdr gZr bnmsqhatsdc sn sgd dwonmdmshZk fqnvsg hm uhcdn bnmsdms bqdZshnm Zmc bnmrtloshnm0 r Z qdrtks: sgd mddc sn dezbhdmskx ptdqx Zmc ZmZkxyd uhcdn cZsZ gZr adbnld hmbqdZrhmfkx uhsZk0 Uhcdn CZsZaZrd LZmZfdldms Rxrsdlr —UCALRr(: Bnl/ otsdq Uhrhnm lncdkr: Zmc gxaqhc KZqfd KZmftZfd“Uhrhnm lnc/ dkr gZud lZcd sqdldmcntr oqnfqdrr snvZqc lZbghmd tmcdq/ rsZmchmf ne uhcdn: kZqfdkx rnkuhmf uhcdn/sn/qdkZshnm ptdqxhmf0 Gnvdudq: qdkZshnmZk cZsZ hr Z onnq gtlZm hmsdqeZbd enq ltk/ shldchZ cZsZ0 HmrsdZc: qdstqmhmf Z rhmfkd uhcdn bZm oqnuhcd Z lnqd hmsthshud Zmc chfdrshakd qdrtks0 Vghkd UCALRr rtoonqs dwsqZbshmf bkhor Zmc nsgdqr gZud gZqc/bncdc uhrtZkhyZshnmr enq rodbhzb sZrjr: rtbg Zr cqZvhmf antmchmf anwdr: ntq rtqudx entmc fZor hm sgd dwoqdrrhudmdrr ne qdrtks uhcdnr Zmc mn qdrdZqbg nm dezbhdmskx lZsdqhZkhyhmf sgdl0 Uhcdn okZxr Zm dudq/dwoZmchmf qnkd hm ntq bnmrtloshnm ne cZsZ: Zmc sgqntfg hmsdqZbshud cZsZ rxrsdlr: hs hr Zkrn adbnlhmf Query Plan COMPOSE CLIP CLIP CLIP CLIP ZOOM Video Edit Optimizer Relational Query in VDBMS or Video Gen AI (External) Execution Engine source videos V2V Synthesized Video OPTIMIZED PLAN “show me a zoomed and highlighted montage of all the main characters in Tears of Steel as 2x2 grid” V2V Query OVERLAY OVERLAY Data-Aware Optimizer Relational Data Video-first App User Interface Ehf0 fl0 U1U Rxrsdl qbghsdbstqd Z kZqfdq oZqs ne ntq hmsdqZbshnm vhsg cZsZ0 Sgd dunktshnm ne hmsdqeZbdr qdfidbsr sghr. cdrjsno rxrsdlr gZud Z vhcd uZqhdsx ne cZsZ lncdkr: ats uhcdn hr Z bkdZq rsZmcnts nm lnahkd cduhbdr: vgdqd okZxhmf Z rhmfkd uhcdn Zr noonrdc sn Z rds ne uhcdn qdrtksr xhdkcr Z adssdq hmsdqeZbd: Zr rgnvm ax sgd oqnkhedqZshnm ne rgnqs/enql uhcdn0 r rtbg: uhcdn qdrtksr dmZakd zqrs/bkZrr lnahkd cduhbd cZsZ hmsdqZbshnm0 Adxnmc lnahkd cduhbdr: tf/ ldmsdc Zmc UhqstZk QdZkhsx rxrsdlr Zkrn gZud zqrs/bkZrr uhcdn qdrtksr: drodbhZkkx Zr khsskd cZsZ hr mZshudkx unktldsqhb0 Hm sghr etstqd bkZrr ne ldsZudqrd cduhbdr: vd gZud lnqd noonqstmhshdr sn trd ltkshldchZ ptdqhdr: drodbhZkkx nudq ldchZ bZostqdc ax sgd cduhbd hsrdke: hm Zcchshnm sn sgd mddc sn rtllZqhyd sgd uhcdnr deedbshudkx hm qdronmrd sn ptdqhdr0 InshuWshmf DwWlokd. Bnmrhcdq Zrjhmf ”Rgnv ld Zkk sgd shldr ydaqZr dwghahsdc rnbhZk adgZuhnq Zmc nudqkZx sgdhq HCr Zmc sgd adgZuhnq sxod‘0 Vd Zrrtld sgdqd Zqd dwhrshmf sZakdr sgZs bnmsZhm sgd mddcdc hmenqlZshnm0 Sgd qdrtks ltrs bnlahmd onsdmshZkkx sgntrZmcr ne rntqbd uhcdnr: ynnl hmsn sgd bnqqdbs rons: Zmc nudqkZx hmenqlZshnmZk sdws nudq sgd qdkduZms ydaqZr hm sgd eqZld0 SncZx: rtbg Z uhcdn qdrtks bZm nmkx ad bqdZsdc ax gZmcvqhshmf bncd enq sghr rodbhzb sZrj0 cchshnmZkkx: sgd qdrtkshmf uhcdn bntkc ad ltkshokd gntqr knmf: vghbg vntkc sZjd knmf sn oqnbdrr0 Sgqntfg cZsZaZrd/rsxkd noshlhyZshnmr cdrbqhadc hm sghr oZodq Zmc nm/cdlZmc rsqdZlhmf: U1U dmZakdr Z UCALR sn dwdbtsd rtbg Z ptdqx Zmc sn adfhm okZxaZbj vhsghm rdbnmcr0 Baseline With MSSOpt ICDE’24
  16. Results – Video Editing 16 Tasks: • E1: Clip •

    E2: Clip & Splice 4 segments • E3: Compose 4 videos into a grid • E4: Apply a filter (blur) E1-E4 are 5-second E5-8 are 1-minute 0.0 2.5 5.0 7.5 E1 E2 E3 E4 E5 E6 E7 E8 Edit Task Execution Time (s) Method Baseline Multi−Source 0% 25% 50% 75% 100% 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 Quality (VMAF) Relative Execution Time (percent) Edit Task E1 E2 E3 E4 E5 E6 E7 E8
  17. Takeaways VOD/ABR streaming services store multiple video sources which you

    can use like proxy/optimized media for faster seeking Using multiple video sources simultaneously for video segment access has a better cost/quality tradeoff MSSOpt solves the generalized form of Multi-Source Selection 17
  18. 18 Thank you! This material is based upon work supported

    by the National Science Foundation under Grant No. 1910356 and the NSF OAC 2118240 Imageomics Institute award.