Slide 1

Slide 1 text

Metrics at Uber Prateek Rungta (@prateekrungta) Engineer, M3 Team Learnings, a few neat Observability Patterns and our OSS metrics platform

Slide 2

Slide 2 text

Small Technology company

Slide 3

Slide 3 text

Uber’s Architecture & Metrics - ~4K Microservices - Central Observability platform, focus on Metrics today - Tracing: Yuri’s talk about Jaeger, Monitorama 2017 - Used for all manner of things - Capacity Planning using System Metrics (e.g. Load Average) - Real-time Alerting using Application metrics (e.g. p99 response time for ride requests) - Tracking business metrics (e.g. number of UberX riders in Portland) - … and plenty more …

Slide 4

Slide 4 text

Developers! Developers! Developers! func myRPCHandler(param int, m MetricScope) { … t := m.Timer(“latency”).Start() responseCode := client.Call(param) t.Stop() m.Tagged(map[string]string{“code”:responseCode}).Counter(“response”).Inc(1) }

Slide 5

Slide 5 text

Queries

Slide 6

Slide 6 text

Discoverability

Slide 7

Slide 7 text

“Golden Signals” Usually you want the same telemetry - SRE Book: Latency, traffic, errors, and saturation - USE Method: Utilisation, saturation, and errors - RED Method: Rate, errors, and duration - Shout out for Baron-Schwartz’s work: video

Slide 8

Slide 8 text

Dynamic Dashboards

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Service

Slide 13

Slide 13 text

End User Code (“Biz logic”) RPC Storage (C*/Redis/…) ...

Slide 14

Slide 14 text

End User Code (“Biz logic”) RPC Storage (C*/Redis/…) ... Library owners: - Dashboard panel template = f(serviceName) - Ensure library emits metrics following given template Application devs: - Service: uses library - Provide “serviceName” at time of generation

Slide 15

Slide 15 text

Auto Alerting Jamie Wilkinson, Monitorama 2018

Slide 16

Slide 16 text

- Grafana : Dynamic Dashboard :: Manual Configured Alerts : ? - E.g. Detect anomalies in latency per RPC endpoint Auto Alerting

Slide 17

Slide 17 text

Alert Storms

Slide 18

Slide 18 text

Alert Storms Alert Storms Grouped & Dependent Alerts Single contextual notification

Slide 19

Slide 19 text

- Remediate alerts around deployments/configurations changes Auto Rollback and Remediation

Slide 20

Slide 20 text

What’s so hard about that?

Slide 21

Slide 21 text

Scale - Ingress 400-600M Pre-aggregated Metrics/s (~130Gbits/sec) (random week when I was making these slides)

Slide 22

Slide 22 text

Scale - Ingress ~20M Metrics Stored/s (~50Gbits/sec) (random week when I was making these slides)

Slide 23

Slide 23 text

Scale - Ingress ~ 6B Unique Metric IDs (random week when I was making these slides)

Slide 24

Slide 24 text

Scale - Egress ~ 2.2K Queries per second (9K Grafana Dashboards, 150K Realtime Alerts) (random week when I was making these slides)

Slide 25

Slide 25 text

Scale - Egress ~ 30B Datapoints per second (~20Gbits/sec) (random week when I was making these slides)

Slide 26

Slide 26 text

- Persisted Metrics: 20% uptick in the last quarter - Unique IDs: 50% uptick in the last half year - QPS: 100% uptick in the last year - Ingress Traffic: 900x in the last 3 years Constantly growing

Slide 27

Slide 27 text

A brief history of M3 - 2014-2015: Graphite - No replication, operations were ‘cumbersome’ - 2015-2016: Cassandra - 16x YoY Growth - Expensive (>1500 Cassandra hosts) - “Technology Telemetry company” - Compactions ⇒ RF=2 ⇒ Repairs too slow - 2016-Today: M3DB

Slide 28

Slide 28 text

M3DB A open source distributed time series database - Store arbitrary timestamp precision datapoints at any resolution for any retention - Optimized file-system storage with no need for compactions - Replicated with zone/rack aware layout and configurable replication factor - Strongly consistent cluster membership backed by etcd - Fast streaming for node add/replace/remove by selecting best peer for a series while also repairing any mismatching series at time of streaming

Slide 29

Slide 29 text

TSZ Timestamp Compression Gorilla

Slide 30

Slide 30 text

- m3tsz = tsz + improvements - More details to follow in a blog, for the curious – https://github.com/m3db/m3db/tree/master/src/dbnode/encoding/m3tsz M3TSZ Overview TSZ M3TSZ Improvement Number of bytes / datapoint 2.42 Compression ratio 6.56x Encoding time (ns) / datapoint 338 Decoding time (ns) / datapoint 347 1.45 40% 11x 40% 298 12% 300 14% These results apply the two different algorithms on Uber’s production data

Slide 31

Slide 31 text

M3TSZ Impact - Data volumes at time of migration end of 2016 ○ Disk usage ~ 1.4PB for Cassandra at RF=2 ○ Disk usage ~ 200TB for M3DB at RF=3

Slide 32

Slide 32 text

M3DB Logical Constructs

Slide 33

Slide 33 text

M3DB Architecture

Slide 34

Slide 34 text

Persistence ● For each incoming write ○ Data is stored in memory in compressed ‘n’-hour blocks, ○ Data is appended to commit log on disk (think WAL), ● We periodically write the compressed blocks to disk as immutable fileset files (think Snapshot file)

Slide 35

Slide 35 text

Layout on Disk ────────────────────────────── Time ──────────────────────────────────────── ▶ ┌──────────────────────────┐ │/var/lib/m3db/commitlogs/ │ └───────────────────┬──────┴─┬────────┬────────┬────────┬────────┬────────┐ │Commit │Commit │Commit │Commit │Commit │Commit │ │Log File│Log File│Log File│Log File│Log File│Log File│ └────────┴────────┴────────┴────────┴────────┴────────┘ ┌──────────────────────────────────────┐ │/var/lib/m3db/data/namespace-a/shard-0│ └───────────────────┬───────────────┬──┴────────────┬───────────────┬───────────────┐ │Fileset File │Fileset File │Fileset File │Fileset File │ │Block │Block │Block │Block │ └───────────────┴───────────────┴───────────────┴───────────────┘ ┌──────────────────────────────────────┐ │/var/lib/m3db/index/namespace-a │ └───────────────────┬──────────────────┴────────────┬───────────────────────────────┐ │Index Fileset File │Index Fileset File │ │Block │Block │ └───────────────────────────────┴───────────────────────────────┘

Slide 36

Slide 36 text

Filesets Files ● Data is flushed from memory to disk every ‘n’ hours as block filesets ● Two flavours: ○ Data fileset blocks contain compressed time-series data (m3tsz) ○ Index fileset blocks contain compressed reverse-indexing data (FSTs/Postings Lists/etc) ● Expired block filesets are periodically cleaned up in the background

Slide 37

Slide 37 text

Commit Log ● Uncompressed ● Support sync and async writes ○ Async for performance: buffer in memory & periodically flush batches

Slide 38

Slide 38 text

- Strongly consistent topology (using etcd) - Consistency managed via synchronous quorum writes and reads - Configurable consistency level - No hinted hand-off - Nodes bootstrap from peers at startup/topology-change Topology & Consistency

Slide 39

Slide 39 text

● Increased replication ○ 2 -> 3x replication factor ● Read Performance Improvements(p50/p95/p99): - C*: 8ms / 270ms / 500ms - M3DB: 0.2ms / 0.35ms / 5ms ● Cheaper(!) M3DB Impact

Slide 40

Slide 40 text

Read cache Write cache What’s production look like today? Host Collector Client Client Host Collector Client Client Host Collector Client Client Query Service M3DB M3DB ES 5.x Aggregation Tier Indexer M3DB Ingester (per region)

Slide 41

Slide 41 text

OSS

Slide 42

Slide 42 text

OSS: Why?

Slide 43

Slide 43 text

OSS & Prometheus Integration Prometheus Grafana M3DB AlertManager M3DB Index Coordinator etcd

Slide 44

Slide 44 text

- Coordinator & Index used in smaller deployments - Feature work to use with Multiple M3DB Cluster deployments (like Uber’s production usage) - Index Read Performance Improvements Caveat Emptor Index Coordinator

Slide 45

Slide 45 text

Where - All development on: github.com/m3db/m3db - Apache v2 - Contributions welcome! - Documentation: http://bit.ly/m3db-docs - Reach us via: http://bit.ly/m3db-forums

Slide 46

Slide 46 text

What’s to come - M3DB: - Lookout for a blog post to drop in July - Ability to backfill data - Index Performance + Multi-clustered Index - Graphite Support for M3Coordinator - … and plenty more … - Aggregator: github.com/m3db/m3aggregator - Packaging, Documentation, etc. - Query Engine (and Query Language) - … and plenty more …

Slide 47

Slide 47 text

Work of a Team

Slide 48

Slide 48 text

- Code: github.com/m3db/m3db - Docs: http://bit.ly/m3db-docs - Forum: http://bit.ly/m3db-forums - Slides: http://bit.ly/m3db-monitorama2018 Thank you! @prateekrungta