platform, focus on Metrics today - Tracing: Yuri’s talk about Jaeger, Monitorama 2017 - Used for all manner of things - Capacity Planning using System Metrics (e.g. Load Average) - Real-time Alerting using Application metrics (e.g. p99 response time for ride requests) - Tracking business metrics (e.g. number of UberX riders in Portland) - … and plenty more …
Book: Latency, traffic, errors, and saturation - USE Method: Utilisation, saturation, and errors - RED Method: Rate, errors, and duration - Shout out for Baron-Schwartz’s work: video
owners: - Dashboard panel template = f(serviceName) - Ensure library emits metrics following given template Application devs: - Service: uses library - Provide “serviceName” at time of generation
arbitrary timestamp precision datapoints at any resolution for any retention - Optimized file-system storage with no need for compactions - Replicated with zone/rack aware layout and configurable replication factor - Strongly consistent cluster membership backed by etcd - Fast streaming for node add/replace/remove by selecting best peer for a series while also repairing any mismatching series at time of streaming
follow in a blog, for the curious – https://github.com/m3db/m3db/tree/master/src/dbnode/encoding/m3tsz M3TSZ Overview TSZ M3TSZ Improvement Number of bytes / datapoint 2.42 Compression ratio 6.56x Encoding time (ns) / datapoint 338 Decoding time (ns) / datapoint 347 1.45 40% 11x 40% 298 12% 300 14% These results apply the two different algorithms on Uber’s production data
in memory in compressed ‘n’-hour blocks, ◦ Data is appended to commit log on disk (think WAL), • We periodically write the compressed blocks to disk as immutable fileset files (think Snapshot file)
every ‘n’ hours as block filesets • Two flavours: ◦ Data fileset blocks contain compressed time-series data (m3tsz) ◦ Index fileset blocks contain compressed reverse-indexing data (FSTs/Postings Lists/etc) • Expired block filesets are periodically cleaned up in the background
synchronous quorum writes and reads - Configurable consistency level - No hinted hand-off - Nodes bootstrap from peers at startup/topology-change Topology & Consistency
work to use with Multiple M3DB Cluster deployments (like Uber’s production usage) - Index Read Performance Improvements Caveat Emptor Index Coordinator
post to drop in July - Ability to backfill data - Index Performance + Multi-clustered Index - Graphite Support for M3Coordinator - … and plenty more … - Aggregator: github.com/m3db/m3aggregator - Packaging, Documentation, etc. - Query Engine (and Query Language) - … and plenty more …