Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enhanced Media Metrics

Shane Tuohy
November 14, 2017

Enhanced Media Metrics

Shane Tuohy's talk at DevOpsDays Galway 2017 on Enhanced Media Metrics.

https://www.linkedin.com/in/shanetuohy/

Shane Tuohy

November 14, 2017
Tweet

Other Decks in Technology

Transcript

  1. Overview • What makes a good metric? • Intro to

    Cisco Spark • What metrics are important for media? • Architecture of a cloud calling system • Enhanced metrics • Customer triage examples
  2. Good Metrics • Get into the mind of the user

    • Percentages vs absolute numbers • Median vs mean vs percentiles • Scalable, proactive, time invariant Worst 10% Worst 1%?
  3. What metrics are important for media? I have no idea

    how to represent network jitter in picture form
  4. Trunking Providers Media Flow Cloud Calling Architecture ? ? ?

    ? Customer Network Public Internet PAAS Third Party Providers
  5. Great, we’re done right? • We’re still not in the

    mind of the customer. • Three separate sources reporting media statistics • Triaging individual calls is not sustainable at scale • Not everyone is a Kibana ninja • Need more insightful data
  6. But I’m lazy.. • Monitoring dashboards isn’t fun • Let’s

    set up alarming on our enhanced metrics • Define what we consider ‘a problem’ • Overall 90th percentile packet loss increases above threshold • Certain percentage of our customers exhibiting poor packet loss • Particular segment experiencing poor packet loss • Page on these and never have to look at the dashboard again
  7. Triage examples • Customer reports poor audio quality • Check

    overall dashboard for system wide issues • Check customer aggregations • Where is the loss happening for this customer? • Help them to triage their internal network issues • Total time taken to isolate problem - <5 minutes
  8. Triage examples • Paged because overall packet loss spikes in

    system • A subset of customers is suddenly seeing issues • Many other customers are just fine • Scratch head…. • ….where are the customers located? • Find out about Comcast issue on the east coast of US • Total time taken to isolate problem – 10-15 minutes
  9. Triage examples • Large customer suddenly spikes packet loss •

    See that loss appears to be in customer network • Reach out to customer • Customer had fallen over to a backup network link for a day • Customer impressed that we were so on top of things • Happy customer trusts our cloud system more