Slide 1

Slide 1 text

Ines Sombra Director of Engineering The burden of a successful feature: 
 Scaling our real time logging platform presents

Slide 2

Slide 2 text

Today’s Agenda A delightful demo & context A deep dive into logging Challenges & future

Slide 3

Slide 3 text

But first… A bit of context

Slide 4

Slide 4 text

Observability tl;dr

Slide 5

Slide 5 text

https://vimeo.com/267641392 Fresh from Altitude NYC —Peter Bourgon, Altitude NYC Observability is an umbrella term. There are different techniques to achieve observability in a system.

Slide 6

Slide 6 text

Peter’s classification of Observability TECHNIQUES SYSTEMS * Lovingly stolen from Peter Bourgon

Slide 7

Slide 7 text

SYSTEMS * Lovingly stolen from Peter Bourgon TODAY Peter’s classification of Observability TECHNIQUES

Slide 8

Slide 8 text

STOP Demo Time!

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

But Why? This pipeline is one of the oldest systems at Fastly Born out of our dissatisfaction w the status quo We wanted something that would send you logs extremely fast (stream them near realtime) to anywhere you want (many endpoints)

Slide 13

Slide 13 text

Log Streaming 
 at Fastly

Slide 14

Slide 14 text

Logging @ Fastly Caches Aggregators Endpoints s3 syslog gcs sumologic bigquery ftp papertrail …

Slide 15

Slide 15 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 16

Slide 16 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 17

Slide 17 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 18

Slide 18 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 19

Slide 19 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 20

Slide 20 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 21

Slide 21 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 22

Slide 22 text

Logging pipeline is Stateless We don’t batch your logs We don’t store your logs We stream your logs in near real-time to your defined endpoints We really don’t want your logs on disk

Slide 23

Slide 23 text

Logging @ Fastly Caches + Senders Aggregators Varnish Varnish Varnish Varnish

Slide 24

Slide 24 text

Varnish Varnish Varnish Varnish Logging @ Fastly Caches + Senders Aggregators

Slide 25

Slide 25 text

Varnish Varnish Varnish Varnish Logging @ Fastly Caches + Senders Aggregators

Slide 26

Slide 26 text

Varnish Varnish Varnish Logging @ Fastly Caches + Senders Aggregators Varnish

Slide 27

Slide 27 text

Logging pipeline is Best Effort We try our best to send logs to your defined endpoint Your endpoint must be up & healthy in order for us to be able to send data to it We have minimal buffering Pipeline optimized for log streaming speed

Slide 28

Slide 28 text

Logging Endpoints We don’t limit the number of endpoints or log lines per request ~8.6K active endpoints Ecosystem of endpoints in different stages of evolution Aggregators Endpoints s3 syslog gcs sumologic bigquery ftp papertrail …

Slide 29

Slide 29 text

Logging Streams data File-based endpoints (time ranged) Streaming endpoints (protocol or http-requests) s3 gcs ftp sftp syslog sumologic bigquery logentries papertrail splunk scalyr honeycomb

Slide 30

Slide 30 text

Logging Growth (2014-2015) ~430K LPS ~1.2K endpoints ~ 2GBps

Slide 31

Slide 31 text

Logging Growth (2014-2015) ~430K LPS ~1.2K endpoints ~ 2GBps

Slide 32

Slide 32 text

Logging Growth (2017-2018) ~3M LPS ~8.6K endpoints ~4GBps

Slide 33

Slide 33 text

Logging Growth (2017-2018) ~3M LPS ~8.6K endpoints ~4GBps

Slide 34

Slide 34 text

Logging Growth (8X!!) ~3M LPS ~8.6K endpoints ~4GBps

Slide 35

Slide 35 text

Logging Endpoints

Slide 36

Slide 36 text

We send a lot of data continuously to our supported endpoints Syslog continues to be our most popular endpoint but S3 & GCS have the highest volume The 70's are still alive with a very respectable 13 MBps to ftp and 74 kBps to sftp* * for the non-millennials Logging Endpoints

Slide 37

Slide 37 text

Challenges & 
 Lessons learned

Slide 38

Slide 38 text

s3 syslog gcs sumologic bigquery ftp papertrail … Logging @ Fastly Caches Aggregators Endpoints

Slide 39

Slide 39 text

Volume Challenges No hard limits to what you can log, this can be challenging System is multi-tenant. Noisy neighbors can affect delivery Consider sampling for high volume logging

Slide 40

Slide 40 text

Burden of many endpoints Classic integrations challenges (each endpoint is a downstream dependency) Standard endpoint clients often don’t meet our needs Having our own clients affords us extra optimizations

Slide 41

Slide 41 text

Endpoints & Health Some endpoints have known limitations (infamous examples: S3, BigQuery, GCS) Difficult to infer if an endpoint is working or not (Hard to test setup too) Structured logging (JSON via VCL) is challenging

Slide 42

Slide 42 text

Service Isolation Prioritize delivery of content over log retention An aggregator discards the oldest logs it has when it can’t deliver them fast enough In a cache node we are our own customers so senders do the same when they can’t reach aggregators fast enough

Slide 43

Slide 43 text

Expectation Mismatch Burden of a system that works so well is that it makes you believe you have strong guarantees Design constraints determine the SLA of the pipeline General advice: Understand the design choices of the systems you use because they limit what is possible to guarantee *

Slide 44

Slide 44 text

The Future of Logging

Slide 45

Slide 45 text

The team have been Busy bees H2 H1 Platform performance & addressing the challenges of individual endpoints We are getting fancy!

Slide 46

Slide 46 text

Platform Performance Reducing lock contention & CPU usage Smarter memory allocation & management Overhauling all endpoints Halving the time it takes for a log line to be processed (from sender read to aggregator line preparation)

Slide 47

Slide 47 text

Getting fancy BigQuery improvements New endpoints: Kafka More integrations with cloud services Make endpoints easier to debug

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Want More?

Slide 50

Slide 50 text

Want more endpoints? Want metrics? Want easier structured logging? Want VCL counters + secondly aggregation + a higher SLA? Want More?

Slide 51

Slide 51 text

Want more endpoints? Want metrics? Want easier structured logging? Want VCL counters + secondly aggregation + a higher SLA? Dom Fee Want More?

Slide 52

Slide 52 text

Want more endpoints? Want metrics? Want easier structured logging? Want VCL counters + secondly aggregation + a higher SLA? Dom Fee Want More?

Slide 53

Slide 53 text

tl;dr LOGGING Fastly lets you extend the visibility of your system to the edge & gain meaningful insights in near real-time Is a pipeline with very specific constraints & guarantees Exciting things are coming!

Slide 54

Slide 54 text

(l,d)ogs of Fastly https://github.com/Randommood/Altitude2018