last N: Relevance-Based Selectivity for Forwarding Video in Multimedia Conferences

Last N: Relevance-Based Selectivity for Forwarding Video in Multimedia Conferences
1 Boris Grozev, Jitsi.org, University of Strasbourg Lyubomir Marinov, Jitsi.org Varun Singh, Aalto University, Finland Emil Ivov, Jitsi.org 20/03/15

SCALING VIDEO CONFERENCES •  Goals: – Bigger conferences – More on one
server 2 20/03/15 •  What kind of conf.: – Centralized – WebRTC – RTP – Meetings, Presentations – Dynamic, Interactive

3 C A B videobridge jitsi MIX Forwarding (SFU) Mixing
(MCU) 16/03/15

4 C A B videobridge jitsi SFU 15/03/15 JITSI VIDEOBRIDGE
•  WebRTC-compatible video router •  ICE; DTLS-SRTP; SRTP; SCTP; • RTCP Termination •  RR, REMB •  SR • SRTP Termination •  (Relatively) Expensive

PROBLEMS(1): ON THE CLIENTS 15/03/15 5 1.  Bandwidth (down): 99
streams * 1Mbps 2.  CPU usage: live decoding of 99 streams 3.  User interface: no space for 99 video elements For a conf of 100:

17/03/15 6 1.  Downstream bandwidth •  Proportional to K • 
100 endpoints = 100Mbps 2.  Upstream bandwidth •  Proportional to K2 •  100 endpoints = 9900 streams = 9.9Gbps PROBLEMS (2): ON THE BRIDGE C A B videobridge jitsi 3. CPU •  Proportional to the total bitrate •  Proportional to K2 For a conf of 100

7 LAST N 16/03/15 videobridge jitsi N=2

8 LAST N: EFFECT 16/03/15 C B videobridge jitsi K
= 100; N = 5; Clients: Receive/Decode/Render 5 streams (was 99) SFU: Downstream: still 100 streams (100Mbps) Upstream: K * N 500Mbps (was 9.9Gbps) N is constant: Linear with K (was K2)

9 LAST N: PAUSING 16/03/15 videobridge jitsi Clients: No encoding,
no upstream SFU: Downstream: N+1 instead of K

DOMINANT SPEAKER IDENTIFICATION •  Requirement for Last N •  The
naïve approach doesn’t work – Different microphones / configuration – Different sound levels in the environment – Coughs 20/03/15 10

DOMINANT SPEAKER IDENTIFICATION •  SotA: I. Volfin and I. Cohen
2013[1] •  Maintains a dominant speaker (DS) –  Others compete –  Detects changes to the DS •  Computes scores over intervals with different length –  Short, medium, long –  Thresholds at each interval •  Works with audio in the frequency domain –  Requires decoding [1] Dominant speaker identification for multipoint videoconferencing, Computer Speech and Language Volume 27 Issue 4, June, 2013 20/03/15 11

DOMINANT SPEAKER IDENTIFICATION •  RFC6464: Client-to-Mixer Audio Level Indication – RTP
header extension – 7 bits that indicate the level of the audio in an RTP packet •  Adapt Volfin and Cohen 2013 – Same competition model – Same intervals •  Short (20ms), medium (100ms) and long (1000ms) – Same division in sub-bands 20/03/15 12

TESTBED 16/03/15 13 videobridge jitsi RECV ONLY JITSI HAMMER QUAD-CORE
XEON 3.7Ghz … K

K = 10, 15, 20, 25, 29, 33 0 5
10 15 20 25 0 100 200 300 400 500 600 0 5 10 15 20 25 CPU usage (%) Bitrate (Mbps) (47.6Mbps, 3.1%) (110.3Mbps, 5.1%) (199.4Mbps, 8.0%) (314.7Mbps, 11.7%) (425.5Mbps, 15.7%) (550.4Mbps, 20.3%) 15/03/15 14

0 50 100 150 200 250 300 350 400 450
10 15 20 25 30 0 50 100 150 200 250 300 350 400 450 Mbps outbound Number of endpoints (K) n=3 n=5 n=8 n=-1 15/03/15 15

Pausing 0 50 100 150 200 250 300 350 400
450 0 5 10 15 20 25 30 0 50 100 150 200 250 300 350 400 450 Bitrate (Mbps) Last N video not paused video paused 15/03/15 16

0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 0 5 10 15 20 25 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 CPU usage (%) Last N video not paused video paused 16/03/15 17 Pausing

CONCLUSION •  Conferences with forwarding use a lot of bandwidth
•  The bottleneck at the server is either the CPU or the network –  In most cases the network •  Cutting down the number of forwarded streams to a constant works as expected •  DSI can be used to maintain interactivity in the conference •  DSI can be performed without decoding audio •  Future work –  Adaptive Last N 15/03/15 18

THANK YOU! 16/03/15 19

last N: Relevance-Based Selectivity for Forward...

last N: Relevance-Based Selectivity for Forwarding Video in Multimedia Conferences

Varun Singh

More Decks by Varun Singh

Other Decks in Research

Featured

Transcript

Last N: Relevance-Based Selectivity for Forwarding Video in Multimedia Conferences

SCALING VIDEO CONFERENCES •  Goals: – Bigger conferences – More on one

3 C A B videobridge jitsi MIX Forwarding (SFU) Mixing

4 C A B videobridge jitsi SFU 15/03/15 JITSI VIDEOBRIDGE

PROBLEMS(1): ON THE CLIENTS 15/03/15 5 1.  Bandwidth (down): 99

17/03/15 6 1.  Downstream bandwidth •  Proportional to K •

7 LAST N 16/03/15 videobridge jitsi N=2

8 LAST N: EFFECT 16/03/15 C B videobridge jitsi K

9 LAST N: PAUSING 16/03/15 videobridge jitsi Clients: No encoding,

DOMINANT SPEAKER IDENTIFICATION •  Requirement for Last N •  The

DOMINANT SPEAKER IDENTIFICATION •  SotA: I. Volfin and I. Cohen

DOMINANT SPEAKER IDENTIFICATION •  RFC6464: Client-to-Mixer Audio Level Indication – RTP

TESTBED 16/03/15 13 videobridge jitsi RECV ONLY JITSI HAMMER QUAD-CORE

K = 10, 15, 20, 25, 29, 33 0 5

0 50 100 150 200 250 300 350 400 450

Pausing 0 50 100 150 200 250 300 350 400

0 1 2 3 4 5 6 7 8 9

CONCLUSION •  Conferences with forwarding use a lot of bandwidth

THANK YOU! 16/03/15 19