Metrics Done Right

Metrics Done Right

A discussion of the problems with using averages and other simple statistical reductions (such as percentiles) on rich data. An expose of histograms and a brief discussion of the challenges of timing low-latency systems behavior with 100% sampling.

565250c4b8bbc8db56d434a482029a6d?s=128

Theo Schlossnagle

October 27, 2016
Tweet

Transcript

  1. PERFORMANCE MONITORING AND NOW FOR SOMETHING
 ENTIRELY DIFFERENT @postwait

  2. None
  3. PERFORMANCE IMPACTS PEOPLE REMEMBER WHY YOU DO THIS

  4. CONSIDER A GOAL AVERAGE PERFORMANCE

  5. CONSIDER A GOAL 99TH PERCENTILE AT 1500MS

  6. THEY AREN’T HARD TO UNDERSTAND, JUST DECEPTIVE AT TIMES. QUICK

    TL;DR ON PERCENTILES • 99th percentile: q(0.99) • 99% of the samples are lower • 1% of the samples are higher q(0.99) = 149μs q(1) = 63ms
  7. OCCUPY! PERFORMANCE OF THE 99%

  8. None
  9. NOW CONSIDER THE PEOPLE YOU’RE BLIND 1266ms 860

  10. PERCENTAGES
 ARE NOT PEOPLE

  11. None
  12. COMPARE YOUR SLA TWO MOMENTARY VIOLATIONS VS. AN EPIC OUTAGE

  13. WITH THE ACTUAL TRAGEDY

  14. WHAT IF I TOLD YOU IT WAS OKAY TO CARE

    I KNOW IT SOUNDS CRAZY, BUT
  15. THEY’RE FASTER
 THAN
 USERS SYSTEMS

  16. IT’S REAL PROBE EFFECT important_op() st := hrtime() important_op() fn

    := hrtime() log(fn-st)
  17. OF PROBING RULES • fixed O(1) operations • no latency

    bubbles • no allocations
  18. TIME IS AN ILLUSION, LUNCHTIME DOUBLY SO - Douglas Adams

  19. TIME BUT FASTER MTEV_TIME https://github.com/circonus-labs/libmtev

  20. FAST
 &
 CORRECT LIBCIRCMETRICS https://github.com/circonus-labs/libcircmetrics

  21. RAYS OF HOPE LIBCIRCMETRICS HIGHLIGHTS • Inspired by stuff we

    saw in go • ability to observe memory • run prep-functions • gauges, counters, strings,
 and log-linear histograms • performance focused: • CPU-fanout counters & histograms • 10ns fixed histogram logging • JSON output, simple API
  22. OCTOPUS THE TECHNOLOGY

  23. FIGHT
 THE OCTOPUS GET OUT THERE - @postwait

  24. None