Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Altitude 2018

Altitude 2018

The burden of a successful feature: Scaling our real time logging platform

Observability is a hot topic in the computing world: we’ve all dealt with systems that are difficult to reason about because we have no visibility into what they’re doing. Fastly’s real time logging gives you immediate visibility into your application’s behavior at the edge. It streams millions of request logs per second and can ship data to customer defined infrastructure including 3rd party cloud services. In this talk we’ll give you a peek into the logging platform, share some challenges we found along the way and the lessons we learned from them. And more importantly we’ll talk about what we are doing to evolve this platform into the future.

https://github.com/Randommood/altitude2018

Ines Sombra

May 22, 2018
Tweet

More Decks by Ines Sombra

Other Decks in Technology

Transcript

  1. Ines Sombra
    Director of Engineering
    The burden of a successful feature: 

    Scaling our real time logging platform
    presents

    View full-size slide

  2. Today’s Agenda
    A delightful
    demo &
    context
    A deep dive
    into logging
    Challenges
    & future

    View full-size slide

  3. But first… A bit of context

    View full-size slide

  4. Observability
    tl;dr

    View full-size slide

  5. https://vimeo.com/267641392
    Fresh from Altitude NYC
    —Peter Bourgon, Altitude NYC
    Observability is an umbrella term. There are different
    techniques to achieve observability in a system.

    View full-size slide

  6. Peter’s classification of Observability
    TECHNIQUES SYSTEMS
    * Lovingly stolen from Peter Bourgon

    View full-size slide

  7. SYSTEMS
    * Lovingly stolen from Peter Bourgon
    TODAY
    Peter’s classification of Observability
    TECHNIQUES

    View full-size slide

  8. STOP
    Demo
    Time!

    View full-size slide

  9. But Why?
    This pipeline is one of the oldest systems at Fastly
    Born out of our dissatisfaction w the status quo
    We wanted something that would send you logs
    extremely fast (stream them near realtime) to
    anywhere you want (many endpoints)

    View full-size slide

  10. Log Streaming 

    at Fastly

    View full-size slide

  11. Logging @ Fastly
    Caches Aggregators Endpoints
    s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    View full-size slide

  12. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  13. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  14. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  15. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  16. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  17. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  18. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  19. Logging pipeline is Stateless
    We don’t batch your logs
    We don’t store your logs
    We stream your logs in
    near real-time to your
    defined endpoints
    We really don’t want your
    logs on disk

    View full-size slide

  20. Logging @ Fastly
    Caches + Senders Aggregators
    Varnish
    Varnish
    Varnish
    Varnish

    View full-size slide

  21. Varnish
    Varnish
    Varnish
    Varnish
    Logging @ Fastly
    Caches + Senders Aggregators

    View full-size slide

  22. Varnish
    Varnish
    Varnish
    Varnish
    Logging @ Fastly
    Caches + Senders Aggregators

    View full-size slide

  23. Varnish
    Varnish
    Varnish
    Logging @ Fastly
    Caches + Senders Aggregators
    Varnish

    View full-size slide

  24. Logging pipeline is Best Effort
    We try our best to send logs to
    your defined endpoint
    Your endpoint must be up &
    healthy in order for us to be
    able to send data to it
    We have minimal buffering
    Pipeline optimized for log
    streaming speed

    View full-size slide

  25. Logging Endpoints
    We don’t limit the number
    of endpoints or log lines
    per request
    ~8.6K active endpoints
    Ecosystem of endpoints in
    different stages of
    evolution
    Aggregators
    Endpoints
    s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    View full-size slide

  26. Logging Streams data
    File-based endpoints (time ranged)
    Streaming endpoints (protocol or http-requests)
    s3 gcs ftp sftp
    syslog
    sumologic
    bigquery logentries papertrail
    splunk scalyr honeycomb

    View full-size slide

  27. Logging Growth (2014-2015)

    ~430K LPS ~1.2K endpoints ~ 2GBps

    View full-size slide

  28. Logging Growth (2014-2015)

    ~430K LPS ~1.2K endpoints ~ 2GBps

    View full-size slide

  29. Logging Growth (2017-2018)
    ~3M LPS ~8.6K endpoints ~4GBps

    View full-size slide

  30. Logging Growth (2017-2018)
    ~3M LPS ~8.6K endpoints ~4GBps

    View full-size slide

  31. Logging Growth (8X!!)
    ~3M LPS ~8.6K endpoints ~4GBps

    View full-size slide

  32. Logging Endpoints

    View full-size slide

  33. We send a lot of data continuously to
    our supported endpoints
    Syslog continues to be our most
    popular endpoint but S3 & GCS have
    the highest volume
    The 70's are still alive with a very
    respectable 13 MBps to ftp and 74
    kBps to sftp*
    * for the non-millennials
    Logging Endpoints

    View full-size slide

  34. Challenges & 

    Lessons learned

    View full-size slide

  35. s3
    syslog
    gcs
    sumologic
    bigquery
    ftp
    papertrail

    Logging @ Fastly
    Caches Aggregators Endpoints

    View full-size slide

  36. Volume Challenges
    No hard limits to what you
    can log, this can be
    challenging
    System is multi-tenant. Noisy
    neighbors can affect delivery
    Consider sampling for high
    volume logging

    View full-size slide

  37. Burden of many
    endpoints
    Classic integrations
    challenges (each endpoint is
    a downstream dependency)
    Standard endpoint clients
    often don’t meet our needs
    Having our own clients
    affords us extra optimizations

    View full-size slide

  38. Endpoints & Health
    Some endpoints have known
    limitations (infamous
    examples: S3, BigQuery, GCS)
    Difficult to infer if an
    endpoint is working or not
    (Hard to test setup too)
    Structured logging (JSON via
    VCL) is challenging

    View full-size slide

  39. Service Isolation
    Prioritize delivery of content over
    log retention
    An aggregator discards the oldest
    logs it has when it can’t deliver
    them fast enough
    In a cache node we are our own
    customers so senders do the
    same when they can’t reach
    aggregators fast enough

    View full-size slide

  40. Expectation Mismatch
    Burden of a system that works so well is that it
    makes you believe you have strong guarantees
    Design constraints determine the SLA of the
    pipeline
    General advice: Understand the design choices of
    the systems you use because they limit what is
    possible to guarantee *

    View full-size slide

  41. The Future of
    Logging

    View full-size slide

  42. The team have been Busy bees
    H2
    H1
    Platform performance
    & addressing the
    challenges of
    individual endpoints
    We are getting fancy!

    View full-size slide

  43. Platform Performance
    Reducing lock contention & CPU usage
    Smarter memory allocation &
    management
    Overhauling all endpoints
    Halving the time it takes for a log line to
    be processed (from sender read to
    aggregator line preparation)

    View full-size slide

  44. Getting fancy
    BigQuery improvements
    New endpoints: Kafka
    More integrations with
    cloud services
    Make endpoints easier to
    debug

    View full-size slide

  45. Want more endpoints?
    Want metrics?
    Want easier structured logging?
    Want VCL counters + secondly
    aggregation + a higher SLA?
    Want More?

    View full-size slide

  46. Want more endpoints?
    Want metrics?
    Want easier structured logging?
    Want VCL counters + secondly
    aggregation + a higher SLA?
    Dom Fee
    Want More?

    View full-size slide

  47. Want more endpoints?
    Want metrics?
    Want easier structured logging?
    Want VCL counters + secondly
    aggregation + a higher SLA?
    Dom Fee
    Want More?

    View full-size slide

  48. tl;dr LOGGING
    Fastly lets you extend the
    visibility of your system to the
    edge & gain meaningful insights
    in near real-time
    Is a pipeline with very specific
    constraints & guarantees
    Exciting things are coming!

    View full-size slide

  49. (l,d)ogs of Fastly
    https://github.com/Randommood/Altitude2018

    View full-size slide