Slide 1

Slide 1 text

the best way to build and ship software Brubeck a high-performance statsd-compatible aggregation daemon

Slide 2

Slide 2 text

the best way to build and ship software ! It’s all about timing… 2

Slide 3

Slide 3 text

the best way to build and ship software ! 3 Measure Everything

Slide 4

Slide 4 text

the best way to build and ship software “ ” ! IT’S NOT SHIPPED UNTIL IT’S FAST. - TED NYMAN Measure Everything 4

Slide 5

Slide 5 text

the best way to build and ship software ! 5 How did we make Brubeck fast?

Slide 6

Slide 6 text

the best way to build and ship software “ ” ! BRUBECK’S FEATURE IS THAT IT HAS NO FEATURES. IT IS A VERY SIMPLE C DAEMON WHICH USES STATE-OF-THE-ART TECHNOLOGIES SUCH AS "A WHILE LOOP" AND "MORE THAN ONE THREAD". - VICENT MARTÍ How did we make Brubeck fast? 6

Slide 7

Slide 7 text

the best way to build and ship software ! 7 What can it do?

Slide 8

Slide 8 text

the best way to build and ship software Write data to Carbon Brubeck consumes metrics via statsd, samples the metrics, and writes the results to Carbon. 8 !

Slide 9

Slide 9 text

the best way to build and ship software ! 9 … " PROTOCOLS SHARDING • Plaintext • Pickle • Consistent hashing to multiple backends Carbon Backend

Slide 10

Slide 10 text

the best way to build and ship software Rich Logging Designed to emit detailed logging information as key=value pairs. 10 !

Slide 11

Slide 11 text

the best way to build and ship software ! Rich Logging 11 jssjr@metrics-­‐collector1-­‐cp1-­‐prd:~$  sudo  tail  -­‐F  /var/log/application.log   2015-­‐06-­‐16T15:00:35-­‐07:00  metrics-­‐collector1-­‐cp1-­‐prd  -­‐-­‐  [config.catchall]  [38047]:   instance=brubeck.metrics-­‐collector1-­‐cp1-­‐prd.catchall  sampler=statsd  event=bad_key   key='team.resque.payload_size.job.Jobs.DeliverNotification.:1|c'  from=184.73.32.39   2015-­‐06-­‐16T15:00:35-­‐07:00  metrics-­‐collector1-­‐cp1-­‐prd  -­‐-­‐  [config.catchall]  [38047]:   instance=brubeck.metrics-­‐collector1-­‐cp1-­‐prd.catchall  sampler=statsd  event=bad_key   key='team.resque.payload_size.job.Jobs.DeliverNotification.:1|c'  from=184.73.32.39   2015-­‐06-­‐16T15:00:38-­‐07:00  metrics-­‐collector1-­‐cp1-­‐prd  -­‐-­‐  [config.secure]  [38045]:   instance=brubeck.metrics-­‐collector1-­‐cp1-­‐prd.secure  sampler=statsd-­‐secure   event=fail_delayed  now=1434492038  timestamp=1434492033  drift=5   2015-­‐06-­‐16T15:00:53-­‐07:00  metrics-­‐collector1-­‐cp1-­‐prd  -­‐-­‐  [config.secure]  [38045]:   instance=brubeck.metrics-­‐collector1-­‐cp1-­‐prd.secure  sampler=statsd-­‐secure   event=fail_delayed  now=1434492053  timestamp=1434492048  drift=5   2015-­‐06-­‐16T15:01:00-­‐07:00  metrics-­‐collector1-­‐cp1-­‐prd  -­‐-­‐  [config.secure]  [38045]:   instance=brubeck.metrics-­‐collector1-­‐cp1-­‐prd.secure  sampler=statsd-­‐secure   event=fail_future  now=1434492060  timestamp=1434492061

Slide 12

Slide 12 text

the best way to build and ship software Statsd Sampler Read data from statsd clients and store it in memory. 12 !

Slide 13

Slide 13 text

the best way to build and ship software ! 13 ⚡ $ CONFIGURE TYPES • Number of worker threads • Enable SO_REUSEPORT • Enable recvmsg(2) • Gauges, meters, counters, histograms, and timers Statsd Sampler

Slide 14

Slide 14 text

the best way to build and ship software Statsd-secure Sampler Just like the statsd sampler, but tamper-proof. 14 !

Slide 15

Slide 15 text

the best way to build and ship software ! 15 % & SECURITY PREVENTS • Prepends the plaintext message with a timestamp and a nonce • Generates an HMAC-SHA256 signature using a shared key • Prepends the payload with the signature • Tampering • Replay attacks • Impersonation Statsd-secure Sampler

Slide 16

Slide 16 text

the best way to build and ship software ! Statsd-secure Sampler 16 Payload Message https://github.com/github/statsd-ruby

Slide 17

Slide 17 text

the best way to build and ship software ! 17 ⚡ ' CONFIGURE CAVEATS • Drift window • Replay length • Doesn’t encrypt messages • Use a tunnel if this is a requirement • Has a 0.1% probability of dropping a valid message because of the bloom filter implementation Statsd-secure Sampler

Slide 18

Slide 18 text

the best way to build and ship software Metrics Expiration Configure the number of seconds before a metric “expires”. If a new data point hasn’t arrived for a metric in the expiration window it will be removed from memory and samples will no longer be flushed to the backend. 18 !

Slide 19

Slide 19 text

the best way to build and ship software Metrics Dump Signal Brubeck with USR2 and it will record all of the metrics in its table to disk, along with each metric’s type. 19 !

Slide 20

Slide 20 text

the best way to build and ship software HTTP API Interact with Brubeck from other tools and services. 20 !

Slide 21

Slide 21 text

the best way to build and ship software ! 21 ( ) GET /PING GET /STATS • return a short JSON payload with the current status of the daemon (just to check it's up) • get a large JSON payload with full statistics, including active endpoints and throughputs HTTP API

Slide 22

Slide 22 text

the best way to build and ship software ! 22 * + GET /METRIC/{{METRIC_NAME}} POST /EXPIRE/{{METRIC_NAME}} • get the current status of a metric, if it's being aggregated • expire a metric that is no longer being reported to stop it from being aggregated to the backend HTTP API

Slide 23

Slide 23 text

the best way to build and ship software Proclines Because the command-line is a beautiful place. 23 !

Slide 24

Slide 24 text

the best way to build and ship software ! Proclines 24 jssjr@metrics-­‐collector1-­‐cp1-­‐prd:~$  ps  auxwww  |  grep  [b]rubeck   brubeck      38043    395    0.0  802596  80092  ?                Ssl    Jun04  71997:07  brubeck  -­‐-­‐   [config.github]  [  ↑  #1  539.5gb,  #2  548.0gb,  #3  531.2gb,  #4  551.0gb,  #5  545.0gb  ]  [  ↓  : 8226  501593/s  ]   brubeck      38045    2.9    0.0  542148  10176  ?                Ssl    Jun04  535:13  brubeck  -­‐-­‐   [config.secure]  [  ↑  #1  67.1gb,  #2  71.0gb,  #3  64.1gb,  #4  69.8gb,  #5  67.1gb  ]  [  ↓  :9126   1131/s  ]   burbeck      38047    112    0.0  793308  94252  ?                Ssl    Jun04  20540:24  brubeck  -­‐-­‐   [config.catchall]  [  ↑  #1  232.0gb,  #2  234.2gb,  #3  232.2gb,  #4  234.2gb,  #5  235.5gb  ]  [  ↓  : 8126  47292/s  ]

Slide 25

Slide 25 text

the best way to build and ship software Limitations Not everything is actually compatible. Sorry. 25 !

Slide 26

Slide 26 text

the best way to build and ship software ! 26 x - DOESN’T SUPPORT DIFFERENCES • Multi-line messages • Statsd sample timing (foo:1|ms|@0.1) • The set type • No stats and stats_count namespace split • Histograms are slightly different, let me show you… Limitations

Slide 27

Slide 27 text

the best way to build and ship software ! Histogram Differences 27 Esty Statsd GitHub Brubeck

Slide 28

Slide 28 text

the best way to build and ship software Thanks! Scott Sanders Infrastructure Engineer @ GitHub https://github.com/jssjr https://twitter.com/scott_sanders