Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Aphorisms of Monitoring

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Aphorisms of Monitoring

Avatar for Chris Christensen

Chris Christensen

March 22, 2015
Tweet

More Decks by Chris Christensen

Other Decks in Technology

Transcript

  1. Landscape Limelight Networks • Top tier CDN. We have a

    video platform & object storage on top • 10 tbps egress with our own international backbone • Global footprint with 100+ data centers • Five figure system magnitude Environment • Technology company, 24/7 engineering presence • Diverse, talented teams with variety of skills - systems & software ◦ Massive FreeBSD Edge ◦ Several flavors of *nix’s
  2. Aphorisms aph·o·rism - noun 1. a pithy observation that contains

    a general truth, such as, “if it ain't broke, don't fix it”. 2. a concise statement of a scientific principle, typically by an ancient classical author. • Monitoring is metrics. • Numbers. • Monitoring Sucks. ◦ #monitoringsucks ◦ #monitoringlove • It would be awesome if...
  3. Anscombe's quartet All four sets are identical when examined using

    simple summary statistics, but vary considerably when graphed … in defense of time-series graphs
  4. Metrics "API" key: value @ time • Graphite: metric_path value

    timestamp ◦ foo.bar.baz 42 74857843 • OpenTSDB: metric timestamp value tagk1=tagv1 ◦ sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0 • Zabbix: metric timestamp value ◦ hw.serial.number 1287872261 SQ4321ASDF
  5. Metrics "API" Metadata • Datatype • Context • Units •

    Description • Eagerness (safe polling interval) • ... Storage • MIB • DB Schema (e.g. Zabbix) • In-band?
  6. Understanding where computation is happening "Back of the envelope" •

    Number of values • Frequency (per second) • Size (integer, float, text) Looks like: Analysis of algorithms → O(n)
  7. Thresholds / Alerting • Analyzing metrics ◦ in real time…

    ◦ in report / summary time... Notable writeups: • Netflix: Atlas • Airbnb / Linkedin ◦ Kafka ◦ Samza
  8. Building on the shoulders of ... • Work with upstream

    ◦ "Waiting for Godot" • Precedence • Valuing schemas
  9. Thoughts and takeaways • Take some time to learn from

    others ◦ Why? "They're just graph servers and stuff" • Monitoring software often has NIH syndrome
  10. Scale ...Can mean more than just raw processing / machinery

    • Teams / collaboration • Situational awareness • The right tool for the job
  11. Q/A

  12. Tools Interesting looking: • Bosun • Sensu • OpenNMS •

    OMD • ELK Stack • Librato • Circonus Experience / can provide feedback: • Zabbix • OpenTSDB • Grafana • Splunk • Jut