Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enhanced Media Metrics

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for Shane Tuohy Shane Tuohy
November 14, 2017

Enhanced Media Metrics

Shane Tuohy's talk at DevOpsDays Galway 2017 on Enhanced Media Metrics.

https://www.linkedin.com/in/shanetuohy/

Avatar for Shane Tuohy

Shane Tuohy

November 14, 2017
Tweet

Other Decks in Technology

Transcript

  1. Overview • What makes a good metric? • Intro to

    Cisco Spark • What metrics are important for media? • Architecture of a cloud calling system • Enhanced metrics • Customer triage examples
  2. Good Metrics • Get into the mind of the user

    • Percentages vs absolute numbers • Median vs mean vs percentiles • Scalable, proactive, time invariant Worst 10% Worst 1%?
  3. What metrics are important for media? I have no idea

    how to represent network jitter in picture form
  4. Trunking Providers Media Flow Cloud Calling Architecture ? ? ?

    ? Customer Network Public Internet PAAS Third Party Providers
  5. Great, we’re done right? • We’re still not in the

    mind of the customer. • Three separate sources reporting media statistics • Triaging individual calls is not sustainable at scale • Not everyone is a Kibana ninja • Need more insightful data
  6. But I’m lazy.. • Monitoring dashboards isn’t fun • Let’s

    set up alarming on our enhanced metrics • Define what we consider ‘a problem’ • Overall 90th percentile packet loss increases above threshold • Certain percentage of our customers exhibiting poor packet loss • Particular segment experiencing poor packet loss • Page on these and never have to look at the dashboard again
  7. Triage examples • Customer reports poor audio quality • Check

    overall dashboard for system wide issues • Check customer aggregations • Where is the loss happening for this customer? • Help them to triage their internal network issues • Total time taken to isolate problem - <5 minutes
  8. Triage examples • Paged because overall packet loss spikes in

    system • A subset of customers is suddenly seeing issues • Many other customers are just fine • Scratch head…. • ….where are the customers located? • Find out about Comcast issue on the east coast of US • Total time taken to isolate problem – 10-15 minutes
  9. Triage examples • Large customer suddenly spikes packet loss •

    See that loss appears to be in customer network • Reach out to customer • Customer had fallen over to a backup network link for a day • Customer impressed that we were so on top of things • Happy customer trusts our cloud system more