Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aphorisms of Monitoring

Aphorisms of Monitoring

Chris Christensen

March 22, 2015
Tweet

More Decks by Chris Christensen

Other Decks in Technology

Transcript

  1. Landscape Limelight Networks • Top tier CDN. We have a

    video platform & object storage on top • 10 tbps egress with our own international backbone • Global footprint with 100+ data centers • Five figure system magnitude Environment • Technology company, 24/7 engineering presence • Diverse, talented teams with variety of skills - systems & software ◦ Massive FreeBSD Edge ◦ Several flavors of *nix’s
  2. Aphorisms aph·o·rism - noun 1. a pithy observation that contains

    a general truth, such as, “if it ain't broke, don't fix it”. 2. a concise statement of a scientific principle, typically by an ancient classical author. • Monitoring is metrics. • Numbers. • Monitoring Sucks. ◦ #monitoringsucks ◦ #monitoringlove • It would be awesome if...
  3. Anscombe's quartet All four sets are identical when examined using

    simple summary statistics, but vary considerably when graphed … in defense of time-series graphs
  4. Metrics "API" key: value @ time • Graphite: metric_path value

    timestamp ◦ foo.bar.baz 42 74857843 • OpenTSDB: metric timestamp value tagk1=tagv1 ◦ sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0 • Zabbix: metric timestamp value ◦ hw.serial.number 1287872261 SQ4321ASDF
  5. Metrics "API" Metadata • Datatype • Context • Units •

    Description • Eagerness (safe polling interval) • ... Storage • MIB • DB Schema (e.g. Zabbix) • In-band?
  6. Understanding where computation is happening "Back of the envelope" •

    Number of values • Frequency (per second) • Size (integer, float, text) Looks like: Analysis of algorithms → O(n)
  7. Thresholds / Alerting • Analyzing metrics ◦ in real time…

    ◦ in report / summary time... Notable writeups: • Netflix: Atlas • Airbnb / Linkedin ◦ Kafka ◦ Samza
  8. Building on the shoulders of ... • Work with upstream

    ◦ "Waiting for Godot" • Precedence • Valuing schemas
  9. Thoughts and takeaways • Take some time to learn from

    others ◦ Why? "They're just graph servers and stuff" • Monitoring software often has NIH syndrome
  10. Scale ...Can mean more than just raw processing / machinery

    • Teams / collaboration • Situational awareness • The right tool for the job
  11. Q/A

  12. Tools Interesting looking: • Bosun • Sensu • OpenNMS •

    OMD • ELK Stack • Librato • Circonus Experience / can provide feedback: • Zabbix • OpenTSDB • Grafana • Splunk • Jut