Slide 1

Slide 1 text

Tran Tuan Linh - 2020/10 CentralDogma Flash’s configuration backbone

Slide 2

Slide 2 text

About me • I am Linh, from Observability Team • github.com/linxGnu • My team deals with: • Traces, Logs, and Metrics

Slide 3

Slide 3 text

Impact Measurement Alerting Trending Root cause Analysis Anomaly Detection cpu [host=10.58.15.27] {
 timestamp: 1600649643,
 value: 0.95 /* 95% */ }

Slide 4

Slide 4 text

Challenges • “Millions” metrics, “Billions” data points per day • Resolution variety • Thus, application/server may export a metric every 60s/30s/15s • 60s -> 1 point per minute • 30s -> 2 points per minute • 15s -> 4 points per minute • 10s -> 6 points per minute • So, higher resolution (shorter interval) -> more points • But give us more detail what’s going on with servers/services/apps • Metrics Storage must be fast for realtime usage (graphing, alerting)

Slide 5

Slide 5 text

Journey • For various reasons, in 2018, we decide to take a ride • Build in-house distributed metrics storage from scratch • Named “Flash” - Avengers hero

Slide 6

Slide 6 text

Flash’s Concern • Configuration management is another challenge • Successful distributed software system needs effective configuration management • Planning, identifying, tracking and verifying changes in configurations • Maintaining configuration integrity

Slide 7

Slide 7 text

Flash’s Concern • What if configuration “looks” like “git” • Changes by "Pull Request” • Workflow through “Review” • Applying changes by “Merge” • Tracking with “Commits” • Great team collaboration

Slide 8

Slide 8 text

• To the rescue • An open-source HA version-controlled service configuration repository based on Git, ZooKeeper and HTTP/2 • Features: • Store configuration files into “Git” (such as .json, .xml, .yaml) • PR for changes, get it reviewed, and merged for applying • Realtime updates-notification • Once PR got merged, services/apps are notified, no reboot

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Cluster Topology Flash - CentralDogma Raft Consensus Shard

Slide 13

Slide 13 text

Shard 1 Shard 2 Shard N … Cluster Topology Raft Configs

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Metric Input Ingestor Querier Downsampler Save/query/dump

Slide 16

Slide 16 text

Provisioning Flash - CentralDogma • Git PR makes it all • Scalability: • Add/remove a shard • Add/remove a node in shard • User provisioning: authorization, query scope, etc • On-the-fly, no servers reboot • Realtime feature adjustment: • Enable/Disable input source(s) • Enable/Disable module • Rate limit adjustment • …

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Other configurations Flash - CentralDogma • Also stored in git repo, managed through CentralDogma • Flash deployment package: • Binaries only • No file config!

Slide 19

Slide 19 text

Finally, with the help of CentralDogma

Slide 20

Slide 20 text

Keynotes • CentralDogma provides native pure Java/Go client library • And more in the future • Awesome, most of Flash modules are written in Go :) • Zero runtime overhead • Clean and deadly simple API • Reliability, Stability • Cost reduction

Slide 21

Slide 21 text

Keynotes • CentralDogma authors • They are friendly, supportive and professional • Stay in user shoes • Problems are often solved quickly • Up to date with patches and detail guidelines

Slide 22

Slide 22 text

I bet you will love it “Why don’t give CentralDogma a try”

Slide 23

Slide 23 text

Thank you for listening