Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SLOs and You, or: How We Learned to Stop Worrying and Love the Queue Length

SLOs and You, or: How We Learned to Stop Worrying and Love the Queue Length

RabbitMQ and the services that use it serve as the backdrop to sharing how FreshBooks came up with some of its first service level objectives. Why are you taking this thing we care about? How else can we know if there are connectivity issues to RabbitMQ? How else can we know if the consumer doesn’t have enough capacity? What does queue length tell us - and what doesn’t it tell us? How do we let go of single-metric “something could be wrong” indicators to more direct indications that something IS wrong?

Join and hear the tale in four parts of how we helped service owners let go of RabbitMQ queue length and take a customer-centric approach to curating services’ SLOs, or how we learned to stop worrying and love the queue length.

Lisa Seelye

May 17, 2018
Tweet

More Decks by Lisa Seelye

Other Decks in Technology

Transcript

  1. Lisa Seelye
    @thedoh
    FreshBooks
    SLO Creation and You
    Or: How We Learned to Stop Worrying and Love the Queue Length
    1

    View Slide

  2. Lisa Seelye
    @thedoh
    FreshBooks
    About Me
    2
    FreshBooks.com
    Mission: Build a world class online
    accounting application to help small
    businesses better manage their
    finances.
    By @burst at https://www.pexels.com/photo/architecture-bridge-building-business-374754/ CC0

    View Slide

  3. Lisa Seelye
    @thedoh
    FreshBooks
    About Our Story
    3
    Empty Theater Seats by @jamie-fernandez-201894 at
    https://www.pexels.com/photo/empty-theater-seats-758976/ / CC0 / cropped

    View Slide

  4. Lisa Seelye
    @thedoh
    FreshBooks
    Our Cast of Characters
    4
    ● Service Level Objectives (SLOs)
    ○ Service Level Agreement (SLAs) as the evil twin
    ● RabbitMQ
    ● Service Ownership
    ● New Service Owners (aka Developers and Product Owners)

    View Slide

  5. Lisa Seelye
    @thedoh
    FreshBooks
    Act I: We Made an SLO
    5

    View Slide

  6. Lisa Seelye
    @thedoh
    FreshBooks
    99.95% Monthly Availability*
    6
    So… What do you think?

    View Slide

  7. Lisa Seelye
    @thedoh
    FreshBooks
    Conspicuously Absent Objectives
    ● Queue length
    ● Queue consumer count
    7

    View Slide

  8. Lisa Seelye
    @thedoh
    FreshBooks
    The Reaction
    “We think you should have queue length as an SLO.”
    8

    View Slide

  9. Lisa Seelye
    @thedoh
    FreshBooks
    Act II: All About Queue Length
    9

    View Slide

  10. Lisa Seelye
    @thedoh
    FreshBooks
    Causes of long queues - Capacity
    10
    Rush Hour by msvg at https://www.flickr.com/photos/msvg/4476789745/ CC-BY 2.0 / cropped

    View Slide

  11. Lisa Seelye
    @thedoh
    FreshBooks
    Causes of long queues - Usage Spike
    11
    Image from NewRelic.com for FreshBooks (used with permission)

    View Slide

  12. Lisa Seelye
    @thedoh
    FreshBooks
    Causes of long queues - Buggy Deploy
    12
    Bugs by searleb at https://www.flickr.com/photos/searleb/3122477836/ CC-BY 2.0 / cropped

    View Slide

  13. Lisa Seelye
    @thedoh
    FreshBooks
    Causes of long queues - Normal Growth
    13
    Landscape Photography of Pavement Road by @jc-estrada-341132 at
    https://www.pexels.com/photo/landscape-photography-of-pavement-road-1046606/ CC0 / cropped

    View Slide

  14. Lisa Seelye
    @thedoh
    FreshBooks
    Causes of long queues - RabbitMQ Did It!
    14
    Images from FreshBooks

    View Slide

  15. Lisa Seelye
    @thedoh
    FreshBooks
    It’s So Broad, What Value Do You Get?
    ● We might not have enough capacity
    ● We might be a problem with the workers
    ● Is RabbitMQ well-provisioned?
    15

    View Slide

  16. Lisa Seelye
    @thedoh
    FreshBooks
    Act III: Asking Questions
    16

    View Slide

  17. Lisa Seelye
    @thedoh
    FreshBooks
    Key Goals
    ● Pager response is quicker
    ● Easier capacity planning
    ● Sets service expectations
    ● Hold ourselves accountable
    17

    View Slide

  18. Lisa Seelye
    @thedoh
    FreshBooks
    Ask Direct Questions
    ● Specific questions have specific answers
    ● Usually service specific
    ● “Why is that important”
    18

    View Slide

  19. Lisa Seelye
    @thedoh
    FreshBooks
    Ok, What To Ask? - Is My Service Healthy?
    19
    No more use... by smithser at https://www.flickr.com/photos/smithser/3434266313 CC-BY 2.0 / cropped

    View Slide

  20. Lisa Seelye
    @thedoh
    FreshBooks
    Ok, What To Ask? - Service Misuse?
    20
    DSC_1607 by justinbaeder at https://www.flickr.com/photos/justinbaeder/5317820857 CC-BY-2.0 / cropped

    View Slide

  21. Lisa Seelye
    @thedoh
    FreshBooks
    21
    White Pocket Watch With Gold-colored Frame on Brown Wooden Board by @iseeghoststoo at
    https://www.pexels.com/photo/white-pocket-watch-with-gold-colored-frame-on-brown-wooden-board-1010513/ CC0 / cropped
    Ok, What To Ask? - Customer Wait Time?

    View Slide

  22. Lisa Seelye
    @thedoh
    FreshBooks
    22
    The big queue at an ATM in Masalli, Azerbaijan by Ds02006 at https://commons.wikimedia.org/wiki/File:ATM_Masalli.jpg / Public Domain / cropped
    Ok, What To Ask? - Consumer Throughput?

    View Slide

  23. Lisa Seelye
    @thedoh
    FreshBooks
    23
    King's Highway 12 - Ontario by dougtone at https://www.flickr.com/photos/dougtone/9190014238/ / CC-SA 2.0 / cropped
    Ok, What To Ask? - Violating Queue Limits?

    View Slide

  24. Lisa Seelye
    @thedoh
    FreshBooks
    Wait, Did You
    Just Suggest
    Queue Length??
    24

    View Slide

  25. Lisa Seelye
    @thedoh
    FreshBooks
    We Have Our Questions … Now What?
    25
    Stealthy Cosmo by pargon at
    https://www.flickr.com/photos/pargon/2381366401 / CC-BY 2.0 / cropped

    View Slide

  26. Lisa Seelye
    @thedoh
    FreshBooks
    Act IV: Historic Pitfalls
    26

    View Slide

  27. Lisa Seelye
    @thedoh
    FreshBooks
    A Look Back
    ● Unfamiliar with RabbitMQ instrumentation
    ● Correlated queue length with problem
    ● Pager fatigue :(
    27

    View Slide

  28. Lisa Seelye
    @thedoh
    FreshBooks
    The Encore
    28

    View Slide

  29. Lisa Seelye
    @thedoh
    FreshBooks
    Wrapping Up
    29

    View Slide

  30. Lisa Seelye
    @thedoh
    FreshBooks
    The End
    30
    SLOs sound cool? Learn More in Google’s SRE Book (Ch. 4)
    https://landing.google.com/sre/book.html

    View Slide