Engineering Manager Observability Team @ Lyft Logging, TSDBs, performance profiling, low-level infrastructure Previously from the world of embedded systems “C is the best programming language”
Go Common base libraries for each language Hundreds of micro services, no monorepo* Deploys frequently, sometimes “never” Common “base” deploy, Salt (masterless), AWS
second, with aggregation/sampling This is only ~200k metrics per seconds, thanks to rollups Per-instance cardinality limits Opt-in mechanisms for per-host and per-second data
Provisions all remote resources a service needs during deploy Interacts with PagerDuty, makes sure a PD service is created Makes sure an on-call schedule is associated Otherwise blocks production deploy
the only source for some data CloudWatch “Hold on let me log in” “My MFA token doesn’t work, can someone else log in?” Using different systems is distracting, delays debugging
Git Monorepo Ties in with our Salt infrastructure Dashboards defined as Salt states, deploys like a service Iteration to staging Grafana Manages Grafana, Wavefront, other services
query parser and rewriter (plug for pyPEG2) Parse query, transform queries into alternate forms Generate “deploy” (canaries vs. production) dashboards Automatic staging environment dashboards Best practices lint