Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Public Dataset for YouTube’s Mobile Streaming...

Florian
June 26, 2018

A Public Dataset for YouTube’s Mobile Streaming Client

Datasets are a valuable resource to analyze, model and optimize network traffic. This work describes a new public dataset for YouTube’s popular video streaming client on mobile devices. At the moment, we are providing 374 hours of time-synchronous measurements at the network, transport and application layer from two controlled environments in Europe.

Florian

June 26, 2018
Tweet

Transcript

  1. A Public Dataset for YouTube’s Mobile Streaming Client Theodoros Karagkioules1,2,

    Dimitrios Tsilimantos1, Stefan Valentin1, Florian Wamser3, Bernd Zeidler3, Michael Seufert3, Frank Loh3 and Phuoc Tran-Gia3 1Paris Research Center, Huawei Technologies, France 2Telecom ParisTech, France 3University of Wurzburg, Germany
  2. • Datasets are a valuable resource to analyze, model and

    optimize network traffic • HTTP Adaptive Streaming (HAS) accounted for 60% of all mobile data traffic in 2016 • Predicted to increase to 78% by 2021 [1] • Some limitations of existing video streaming traffic datasets: • Most publicly available datasets don’t include application-layer information • QUIC has only recently been widely employed and thus not thoroughly studied • HAS traffic from mobile devices has only rarely been covered • Measurement environment sometimes not tightly regulated • Not always reproducible (tools not open sourced etc.) Motivation 2 [1] Cisco, “Visual Networking Index: Forecast and Methodology, 2016-2021”, White Paper, Feb. 2017
  3. The dataset in a nutshell • We present a new

    public dataset for YouTube’s popular video streaming client that: • consists of > 370 hours of HAS traffic, transported over UDP/QUIC • was generated on mobile devices, with YouTube’s native Android application • was created under controlled mobile coverage scenarios • is fully reproducible (all tools either open sourced or publicly available) • Measurements were conducted: • at 2 locations: Paris, France and Wurzburg, Germany • over the course of 5 months (from Sep. 2017 to Feb. 2018) • Recorded information: • Network (L3) and transport layer (L4) information captured via tcpdump • Application-level (L7) statistics and video progress info captured via our ‘Wrapper App’ (open sourced) 3
  4. Background • Mobile data traffic is dominated by the Dynamic

    Adaptive Streaming over HTTP (DASH) [2] • An HAS video stream is available in multiple qualities, each divided into segments • Segments are downloaded using HTTP on top of TCP or UDP (QUIC) • Video client controls bitrate by adapting (i) video quality and (ii) segment download time Low Medium High Quality Time Request for specific quality Network conditions Bandwidth Time Time Quality Low Medium High Web server with various qualities per segment Received segments at streaming client HAS policy not standardized [2] ISO/IEC, “Dynamic adaptive streaming over HTTP (DASH),” International Standard DIS 23009-1.2, ISO/IEC, 2012 4
  5. Measurement platform • HAS client: YouTube Android application (v. 12.32.60)

    • Smartphones: • Paris: 2 x Huawei Nexus 6P (model: H1512) • Wurzburg: 1 x Google Nexus 7 (model: marlin) • Android 7.1.1 • Computer: • Linux Kernel 3.16.0-71-lowlatency • IEEE 802.11g WLAN access point • Traffic configuration via Linux tc • Packet logs were recorded: • with tcpdump 4.7.4 (libpcap 1.7.4) • simultaneously on the Smartphones and computer • unfiltered: All packets logged 5
  6. Wrapper App YouTube’s Android App with “stats for nerds” enabled

    • Remotely controls the YouTube application (ADB) • Emulates button clicks via UI automator • Extracts: • Debug info from YouTube’s “stats for nerds” • Status of video progress bar • Available at: https://github.com/lsinfo3/yomo-wrapperapp • Specified in detail at [3] [3] M. Seufert et al., “A wrapper for automatic measurements with YouTube’s native Android app,” in Proc. IEEE TMA, Jun. 2018. 6
  7. Measurement scenarios • 8 streaming profiles measured with diverse channel

    profiles • Configuration of throughput, PER and delay • Designed in accordance to [4] [4] DASH Industry Forum. 2014. Guidelines for Implementation: DASH-AVC/264 Test cases and Vectors. Report (Jan. 2014). HAS scenarios Example for Scenario 6: traffic configuration of rate (Mbit/s), delay τ (ms) and PER ϵ (%) 7
  8. Streaming content https://www.youtube.com/watch?v=OHOpb2fS-cM https://www.youtube.com/watch?v=N2sCbtodGMI https://www.youtube.com/watch?v=2d1VrCvdzbY Streaming content and video bit-rate

    statistics 8 • 3 diverse videos (wide range of spatiotemporal correlation) • Non-monetized in order to avoid interruptions • Encoded at 24, 25 and 30 fps • Available at the horizontal resolutions {144p–1080p} • Available in two representations: • As a H.264-encoded stream in an MP4 container • As a VP9-encoded stream in a WebM container Nature TalkShow ToS
  9. The dataset Number of total experiments and number of labeled

    experiments in parentheses • Dataset publicly available at [5] • The data were recorded in Paris, France and Wurzburg, Germany, over the course of 5 months • Overall 2201 experiments (374 hours of streaming) • Of which 25% is labeled • Size of dataset: ~3GB (compressed) • All of data generated by YouTube (Android) • Protocols: • Video flows: UDP/QUIC • Signaling: TCP • Time-synchronous measurements at the ISO/OSI Layers 3, 4 and 7 in controlled scenarios [5] T. Karagkioules et al. 2018. A public dataset for YouTube’s Mobile Streaming Client. https://www3.informatik.uni-wuerzburg.de/qoecube 9
  10. • Measurement text files (UTF-8 encoding and csv-format): • tcpdump

    log (Smartphone and PC) • Event log (network traffic configuration or video quality) • Statistics log (YouTube statistics module a.k.a. stats for nerds) • DNS log • Video progress log • Buffer state labels Dataset structure 10 • Filename fields: • Location • Capturing device • Type of data • Scenario index • Video ID • Iteration number • Filename example: Paris_PC_TCPdump_Scenario_3_Vid_N2sCbtodGMI_Iteration_1.log • Directory example: ./Paris/ Scenario_1/Vid_2d1VrCvdzbY/Iteration_1
  11. Network and transport-layer information Selection of recorded parameters relevant to

    HAS 11 • TCP, UDP and IP information is recorded, via tcpdump, on control computer and Smartphones • No payload, but only meta data from the IP and UDP/TCP headers are logged • Logs of all IP packets (ingoing as well as outgoing) • YouTube’s Android client employed QUIC • Every record is time stamped 1517489137.100561 IP (tos 0x0, ttl 63, id 16270, offset 0, flags [DF], proto UDP (17), length 67) 192.168.10.200.41688 > 209.85.230.251.443: UDP, length 39 1517489137.100583 IP (tos 0x0, ttl 54, id 0, offset 0, flags [DF], proto UDP (17), length 1378) 209.85.230.251.443 > 192.168.10.200.41688: UDP, length 1350 Example of tcpdump output
  12. Application-layer information Example of data from Statistics and Events logfiles,

    under experiment scenario (s4). 12 • Our Wrapper App is used to extract information from YouTube’s ‘stats for nerds’ • Statistical application-layer information locally generated at the client side • For each experiment, all control events are logged • Additionally: • DNS-queries are recorded • Video progress data="{"csdk":"25","c":"android","cbrand":"Huawei","cbrver":"12.32.60","cplayer":"ANDROID_EXOPLAYER","cplat form":"mobile","cmodel":"Nexus6P","cver":"12.32.60","cbr":"com.google.android.youtube","cosver":"7.1.1","cos": "Android","videoid":"2d1VrCvdzbY",fmt":"133","afmt":"139","bh":5658,"bwe":674671} Example of stat for nerds output
  13. Buffer state labels • 4 different buffer states: filling, steady,

    depleting, unclear • We developed a GUI, to manually label buffer states that allows to: • plot accumulated data separately per packet flow • specify time intervals for which an independent label can be assigned • select a label to associate with a previously defined interval 13
  14. • Researchers can compare the performance of their own bitrate

    adaptation algorithms against YouTube • tcpdump logs can be used to study and to improve packet generation of adaptive streaming clients • Estimation of Quality of experience (QoE) factors (initial buffering, adaptation smoothness and stability) • Our data can be used for QoE estimation via a model, for instance: ITU-T SG12 P-NATS specification Application: HAS client design and QoE estimation 14 QoE model QoE estimate Video bit-rate Estimator Waiting time Estimator Estimates of QoE factors … … Inputs: Time series (IAT, packet size,…) 1 1 … +1 Non-estimated QoE factors + … P-NATS Traffic profiling
  15. Usage examples: streaming parameters estimation 15 • Estimation of application-layer

    parameters and quality, based on packet flows [6,7] • Service classification based on packet arrival patterns [6] • Streaming-specific billing and traffic shaping (as in TMobile’s Binge On) • Custom-tailor admission control and resource allocation in cellular networks [6] D. Tsilimantos et al., “Classifying flows and buffer state for YouTube’s HTTP adaptive streaming service in mobile networks,” in Proc. ACM MMSys, Jun. 2018 [7] D. Tsilimantos et al., “Traffic profiling for mobile video streaming,” in Proc. IEEE ICC, May 2017. Buffer state estimation Streaming rate estimation Coverage loss
  16. • We described our public dataset for YouTube’s mobile streaming

    client and the methods to reproduce it • Our dataset offers 374 hours of synchronous measurements at the network, transport and application layer • Bonus: buffer state labels for YouTube’s adaptive streaming logic • Dataset used to analyze adaptive streaming traffic and design traffic managing functions for cellular networks • Our data and tools also enable the community to better understand how to model and control HAS traffic • Next steps: Include more OTT’s (i.e. Netflix, Dailymotion, Vimeo, Facebook etc.) Summary 16