Ines Sombra
Director of Engineering
The burden of a successful feature:
Scaling our real time logging platform
presents
Slide 2
Slide 2 text
Today’s Agenda
A delightful
demo &
context
A deep dive
into logging
Challenges
& future
Slide 3
Slide 3 text
But first… A bit of context
Slide 4
Slide 4 text
Observability
tl;dr
Slide 5
Slide 5 text
https://vimeo.com/267641392
Fresh from Altitude NYC
—Peter Bourgon, Altitude NYC
Observability is an umbrella term. There are different
techniques to achieve observability in a system.
Slide 6
Slide 6 text
Peter’s classification of Observability
TECHNIQUES SYSTEMS
* Lovingly stolen from Peter Bourgon
Slide 7
Slide 7 text
SYSTEMS
* Lovingly stolen from Peter Bourgon
TODAY
Peter’s classification of Observability
TECHNIQUES
Slide 8
Slide 8 text
STOP
Demo
Time!
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
But Why?
This pipeline is one of the oldest systems at Fastly
Born out of our dissatisfaction w the status quo
We wanted something that would send you logs
extremely fast (stream them near realtime) to
anywhere you want (many endpoints)
Logging pipeline is Stateless
We don’t batch your logs
We don’t store your logs
We stream your logs in
near real-time to your
defined endpoints
We really don’t want your
logs on disk
Logging pipeline is Best Effort
We try our best to send logs to
your defined endpoint
Your endpoint must be up &
healthy in order for us to be
able to send data to it
We have minimal buffering
Pipeline optimized for log
streaming speed
Slide 28
Slide 28 text
Logging Endpoints
We don’t limit the number
of endpoints or log lines
per request
~8.6K active endpoints
Ecosystem of endpoints in
different stages of
evolution
Aggregators
Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
We send a lot of data continuously to
our supported endpoints
Syslog continues to be our most
popular endpoint but S3 & GCS have
the highest volume
The 70's are still alive with a very
respectable 13 MBps to ftp and 74
kBps to sftp*
* for the non-millennials
Logging Endpoints
Volume Challenges
No hard limits to what you
can log, this can be
challenging
System is multi-tenant. Noisy
neighbors can affect delivery
Consider sampling for high
volume logging
Slide 40
Slide 40 text
Burden of many
endpoints
Classic integrations
challenges (each endpoint is
a downstream dependency)
Standard endpoint clients
often don’t meet our needs
Having our own clients
affords us extra optimizations
Slide 41
Slide 41 text
Endpoints & Health
Some endpoints have known
limitations (infamous
examples: S3, BigQuery, GCS)
Difficult to infer if an
endpoint is working or not
(Hard to test setup too)
Structured logging (JSON via
VCL) is challenging
Slide 42
Slide 42 text
Service Isolation
Prioritize delivery of content over
log retention
An aggregator discards the oldest
logs it has when it can’t deliver
them fast enough
In a cache node we are our own
customers so senders do the
same when they can’t reach
aggregators fast enough
Slide 43
Slide 43 text
Expectation Mismatch
Burden of a system that works so well is that it
makes you believe you have strong guarantees
Design constraints determine the SLA of the
pipeline
General advice: Understand the design choices of
the systems you use because they limit what is
possible to guarantee *
Slide 44
Slide 44 text
The Future of
Logging
Slide 45
Slide 45 text
The team have been Busy bees
H2
H1
Platform performance
& addressing the
challenges of
individual endpoints
We are getting fancy!
Slide 46
Slide 46 text
Platform Performance
Reducing lock contention & CPU usage
Smarter memory allocation &
management
Overhauling all endpoints
Halving the time it takes for a log line to
be processed (from sender read to
aggregator line preparation)
Slide 47
Slide 47 text
Getting fancy
BigQuery improvements
New endpoints: Kafka
More integrations with
cloud services
Make endpoints easier to
debug
Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
Slide 52
Slide 52 text
Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
Slide 53
Slide 53 text
tl;dr LOGGING
Fastly lets you extend the
visibility of your system to the
edge & gain meaningful insights
in near real-time
Is a pipeline with very specific
constraints & guarantees
Exciting things are coming!
Slide 54
Slide 54 text
(l,d)ogs of Fastly
https://github.com/Randommood/Altitude2018