Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NOW TV and Linear Streaming: The unpredictable scalability challenge - Devoxx UK 2015

NOW TV and Linear Streaming: The unpredictable scalability challenge - Devoxx UK 2015

NOW TV’s customer base has tripled year on year, and linear event streaming has become a greater and greater percentage of their viewing. This talk will take you through the lessons we learnt following one unfortunate event in 2014 which led NOW TV to catastrophic failure, and the improvements and scalability challenges we were faced with when preparing for the same event just 1 year later.

Hear how we turned our entire failure case behaviour on its head; from always failing in favour of NOW TV, to failing in favour of the customer. Learn how we ensure that high availability of video assets on the CDN ensures continued playout irrespective of the state of our application servers. Take away some tips of how NOT to scale, as well as how we reversed our fortunes and delivered stability during the biggest event on NOW TV to date.

No cover-ups or side-stepping our previous faults, hear about the rough with the smooth, along with the customer sentiment that has made us a far better service today. Get an insight into NOW TV’s platform and architecture from CDNs, through our Java/Groovy/Scala application servers, down to our Mongo databases, and learn how we will handle the load expected when we double our customer base in the coming year.

Tom Maule

June 19, 2015
Tweet

Other Decks in Technology

Transcript

  1. 2 •  Tom Maule –  Solution Architect at NOW TV,

    Sky –  Previously Senior Java Developer on NOW TV Platform team (since project inception in early 2012) I have also previously worked in the defence and telecoms industries [email protected] linkedin.com/in/tommaule @tommaule Who am I?
  2. 3 Abstract •  NOW TV Introduction •  Linear streaming challenges

    •  7th April 2014 •  Fixes and improvements •  13th April 2015 •  Future work and next steps
  3. 4 Introduction - Overview •  NOW TV is the online

    no-contract TV streaming service from Sky •  Available on over 60 devices including the award-winning NOW TV Box •  NOW TV offers movies and entertainment VOD and linear content, and for the first time in the UK, pay-as-you-go Sports linear content
  4. 7 Introduction - NOW TV Architecture CDN Content Content Metadata

    Account Data VOD Transcoding Linear Transcoding CDN Manifest and video chunks Live video stream Stream upload Asset upload Content metadata, User services User device Video Assets NOW TV Platform Load Balancer Load Balancer Services Logs Splunk MMS Icinga Monitoring & alerting: New Relic
  5. 8 Video On Demand (VOD) •  Video content, available on

    demand, whenever users want it. •  Platform load is predictable, just ask any of Netflix, Amazon Instant Video, YouTube, etc
  6. 9 Video On Demand (VOD) •  Even weekend load, though

    busier during the day, remains predictable
  7. 10 Linear Streaming •  Unlike other OTT (Over-the-Top) Providers, NOW

    TV offers streaming of live channels •  This is typically NOT predictable •  Load is driven by live events, not by time of day Linear VOD
  8. 12

  9. 13

  10. 14

  11. 15

  12. 17

  13. 20 What happened? •  High load stressed our database • 

    Retries only compounded the problem •  Observed issues: –  Customers couldn’t start new streams –  Existing streams were terminated –  Concurrency errors during and shortly after the outage –  Very high read and write queues in Mongo DB –  Entitlement and Viewing History APIs performed very slowly –  High proportion of time was spent updating indexes in Mongo DB
  14. 21 Issues to Address •  Heartbeating resiliency •  Concurrency inaccuracies

    •  Entitlement checking •  Products storage •  Viewing History •  Indexes in Mongo DB •  Mongo DB write lock H C E P V I M H C E P V I M
  15. 22 Heartbeating: Introduction •  After playout initiation, actual video chunks

    are served by CDN, and don't touch our platform •  Lightweight heartbeats call back to our platform to notify us of continued playout every 10 mins •  NOW TV use heartbeats to: –  Enforce concurrency rules –  Enforce entitlement –  Record bookmark positions (VOD only) CDN NOW TV Video chunks Heartbeats (10 min interval) H C E P V I M
  16. 23 Heartbeating: Previously •  Previously, a non-OK heartbeat response would

    terminate playout on the user’s device •  Fail in favour of NOW TV –  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat. CDN NOW TV Video chunks H C E P V I M
  17. 24 Heartbeating: Previously •  Previously, a non-OK heartbeat response would

    terminate playout on the user’s device •  Fail in favour of NOW TV –  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat. CDN NOW TV Video chunks Heartbeat non-OK response H C E P V I M
  18. 25 Heartbeating: Previously •  Previously, a non-OK heartbeat response would

    terminate playout on the user’s device •  Fail in favour of NOW TV –  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat. CDN NOW TV Heartbeat non-OK response H C E P V I M
  19. 26 Heartbeating: Today •  Today, playout continues unless a specific

    STOP heartbeat response is received •  Fail in favour of the customer –  Existing streams will NOT be terminated if NOW TV becomes unavailable CDN NOW TV Video chunks H C E P V I M
  20. 27 Heartbeating: Today •  Today, playout continues unless a specific

    STOP heartbeat response is received •  Fail in favour of the customer –  Existing streams will NOT be terminated if NOW TV becomes unavailable CDN NOW TV Video chunks Heartbeat non-STOP response H C E P V I M
  21. 28 Heartbeating: Future •  Game of Thrones Linear customers produce

    ripple-effect heartbeating –  Due to heartbeats fixed to a 10 minute period •  In future, we will randomise the first heartbeat period in attempt to smooth out these ripples H C E P V I M
  22. 29 { “playouts”: [] } Concurrency: Introduction •  Concurrency of

    2 streams is managed through the concept of Playout Slots •  A playout slot keeps track of a currently playing stream •  Slots are allocated on playout initiation NOW TV Mongo DB C E P V I M H
  23. 30 { “playouts”: [] } Concurrency: Introduction •  Concurrency of

    2 streams is managed through the concept of Playout Slots •  A playout slot keeps track of a currently playing stream •  Slots are allocated on playout initiation { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } NOW TV Mongo DB Play C E P V I M H
  24. 31 { “playouts”: [] } Concurrency: Introduction •  Concurrency of

    2 streams is managed through the concept of Playout Slots •  A playout slot keeps track of a currently playing stream •  Slots are allocated on playout initiation { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } NOW TV Mongo DB Play C E P V I M H
  25. 32 { “playouts”: [] } Concurrency: Introduction •  Concurrency of

    2 streams is managed through the concept of Playout Slots •  A playout slot keeps track of a currently playing stream •  Slots are allocated on playout initiation { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } NOW TV Mongo DB Play C E P V I M H
  26. 33 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Introduction •  Slots are updated on heartbeats to refresh the time stamp •  Slots are terminated on an END event NOW TV Mongo DB C E P V I M H
  27. 34 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Introduction •  Slots are updated on heartbeats to refresh the time stamp •  Slots are terminated on an END event { “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } NOW TV Mongo DB END C E P V I M H
  28. 35 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Introduction •  Slots are updated on heartbeats to refresh the time stamp •  Slots are terminated on an END event { “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } NOW TV Mongo DB Play { “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “CBF789”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } C E P V I M H
  29. 36 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Previously •  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout •  Previously, this blocked subsequent playouts for up to 10 minutes •  “Concurrency limit reached” errors were seen after our service had been restored on GoT night NOW TV Mongo DB C E P V I M H
  30. 37 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Previously •  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout •  Previously, this blocked subsequent playouts for up to 10 minutes •  “Concurrency limit reached” errors were seen after our service had been restored on GoT night NOW TV Mongo DB C E P V I M H
  31. 38 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] } Concurrency: Previously •  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout •  Previously, this blocked subsequent playouts for up to 10 minutes •  “Concurrency limit reached” errors were seen after our service had been restored on GoT night NOW TV Mongo DB Play C E P V I M H
  32. 39 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] } Concurrency: Today •  Now, slots allocated to the same Device ID can be ‘reclaimed’ •  No more “Concurrency limit reached” errors following app crashes or service outages NOW TV Mongo DB box1 box2 C E P V I M H
  33. 40 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] } Concurrency: Today •  Now, slots allocated to the same Device ID can be ‘reclaimed’ •  No more “Concurrency limit reached” errors following app crashes or service outages NOW TV Mongo DB box1 box2 C E P V I M H
  34. 41 { “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”,

    “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] } Concurrency: Today •  Now, slots allocated to the same Device ID can be ‘reclaimed’ •  No more “Concurrency limit reached” errors following app crashes or service outages NOW TV Mongo DB Play { “playouts”: [ { “id” : “FCE987”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] } box1 box2 C E P V I M H
  35. 42 Entitlements: Introduction •  Entitlement is granted based upon the

    products purchased and the content being consumed •  Products and content are tagged with entitlement tags •  Tag intersection indicates entitlement to consume tag: sports tag: entertainment E P V I M H C tag: entertainment tag: movies tag: sports
  36. 43 Entitlements: Previously •  Entitlement checking was not efficient –

    checked by content ID ­  /entitlement/movie/<id> ­  /entitlement/episode/<id> ­  /entitlement/stream/<id> •  Entitlement was checked on every details page before any call-to-action •  Content tags almost never changed E P V I M H C
  37. 44 Entitlements: Today •  Entitlement checking by tag(s) was introduced

    –  /entitlement/tags/movies •  Entitlement checking now only needed to occur once per collection or ‘section’ of the app •  Where entitlement checking by content ID is still necessary –  tags are cached in memory E P V I M H C
  38. 45 Product Storage: Previously •  Every purchase and renewal of

    any product resulted in a new Product entity in Mongo DB Entertainment – June 2015 Movies – August 2015 Sports – 20th July 2015 Entertainment – July 2015 Entertainment – August 2015 Movies – September 2015 Entertainment – September 2015 Sports – 12th September 2015 Movies – October 2015 Entertainment – October 2015 Entertainment – November 2015 P V I M H C E
  39. 46 Product Storage: Today •  We store entitlement entities instead

    of products, updating on renewals rather than duplicating Entertainment – June 2015 Movies – August 2015 Sports – 20th July 2015 Entertainment – July 2015 Entertainment – August 2015 Movies – September 2015 Entertainment – September 2015 Sports – 12th September 2015 Movies – October 2015 Entertainment – October 2015 Movies – November 2015 Entertainment – November 2015 P V I M H C E
  40. 47 Viewings & Bookmarks: Introduction •  Viewing a VOD asset

    => Viewing •  Heartbeating during a VOD asset => Bookmark •  Viewings and Bookmarks were stored separately •  No capping or archiving V I M H C E P
  41. 48 Viewings & Bookmarks: Previously •  Upon fetching a customer’s

    viewing history, multiple database queries were made: ­  1 query to the viewings collection to fetch n viewings for the customer ­  n queries to the bookmarks collection to fetch the bookmark position for each viewing ­  TOTAL: n + 1 Mongo DB queries for a single request! ­  Some customers had thousands of items in their viewing history! { “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” } { “_id”: “bcd345”, “accountId”: “account1”, “contentId”: “movie2”, “timestamp”: “<timestamp>” } { “_id”: “cde456”, “accountId”: “account1”, “contentId”: “episode1”, “timestamp”: “<timestamp>” } Viewings { “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 } { “_id”: “edc765”, “accountId”: “account1”, “contentId”: “movie2”, “position”: 2854 } { “_id”: “dcb543”, “accountId”: “account1”, “contentId”: “episode1”, “position”: 3542 } Bookmarks } V I M H C E P
  42. 49 Viewings & Bookmarks: Today •  The original reason for

    keeping viewings and bookmarks separate was no longer apparent •  Now, viewings and bookmarks are merged –  Unnecessary document ID replaced with compound ID – improving indexing efficiency –  Shortened field names - reducing storage consumption and further improving indexing efficiency { “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” } Viewing { “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 } Bookmark { “_id”: { “accountId”: “account1”, “contentId”: “movie1” }, “position”: 1187, “timestamp”: “<timestamp>” } View History { “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” } V I M H C E P
  43. 50 Mongo Indexes { “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”,

    “timestamp”: “<timestamp>” } { “_id”: “abc123”, “accountId”: “account1” } { “_id”: “abc123”, “accountId”: “account1”, “timestamp”: “<timestamp>” } { “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1” } { “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 } { “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1” } { “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” } { “_id.aid”: “account1”, “ts”: “<timestamp>” } { “_id”: “abc123” } { “_id”: “fed987” } { “_id”: { “aid”: “account1”, “cid”: “movie1” } } Viewing Bookmark View History I M H C E P V
  44. 51 Mongo Instance Database 1 Collection 1 Mongo Write Locks:

    Previously Document Collection 2 Document Document Document Document Document Document Document Database 2 Collection 3 Document Document Collection 4 Document Document Document Document M H C E P V I
  45. 52 Mongo Instance Database 4 Database 2 Database 1 Mongo

    Write Locks: Today Collection 2 Document Document Document Collection 1 Document Document Document Document Document Database 3 Collection 3 Document Document Collection 4 Document Document Document Document M H C E P V I
  46. 54 NOW TV Customer Base 2014 - 2015 •  Our

    customer base TRIPLED, again, in the year up to April 2015 2013 2014 2015
  47. 56 What happened? •  Good platform availability throughout •  2.5x

    the load that affected us just one year earlier •  Twice the normal concurrency for a typical Monday night
  48. 59 Recognition MongoDB Innovation Award 2015 recognises organisations that are

    creating ground-breaking applications. These projects represent the best and most innovative work in the industry over the last year. DTG Innovation Award 2015 recognises organisations which have driven innovation in a particular technology or sector
  49. 60 What’s Next For NOW TV? •  Our growth is

    expected to continue along the same trajectory •  Moving to active-active datacentre architecture for increased resiliency •  Cloud-based ‘overflow’ scaling for high-load events •  Microservices •  Sub-system resiliency
  50. 61 Credits •  The entire NOW TV Technology team are

    credited with our success –  Platform Software Engineers –  Platform Quality Assurance Engineers –  Dev-Ops Engineers –  App Developers & Testers –  Analysts, scrum masters and management •  Be a part of our future success, work for NOW TV at Sky –  www.workforsky.com –  @workforsky