Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SLO Creation and You: Or, How We Learned to Stop Worrying and Love the Queue Length

Lisa Seelye
October 24, 2018

SLO Creation and You: Or, How We Learned to Stop Worrying and Love the Queue Length

Similar material to DevOps Days, but updated for ExploreTech 2018

Lisa Seelye

October 24, 2018
Tweet

More Decks by Lisa Seelye

Other Decks in Technology

Transcript

  1. Lisa Seelye
    @thedoh
    SLO Creation and You
    Or: How We Learned to Stop Worrying and Love the Queue Length
    1

    View full-size slide

  2. Lisa Seelye
    @thedoh
    About Me
    2
    By @burst at https://www.pexels.com/photo/architecture-bridge-building-business-374754/ CC0

    View full-size slide

  3. Lisa Seelye
    @thedoh
    About Our Story
    3
    Empty Theater Seats by @jamie-fernandez-201894 at
    https://www.pexels.com/photo/empty-theater-seats-758976/ / CC0 / cropped

    View full-size slide

  4. Lisa Seelye
    @thedoh
    Our Cast of Characters
    4
    ● Service Level Objectives (SLOs)
    ○ Service Level Agreement (SLAs) as the evil twin
    ● RabbitMQ
    ● Service Ownership
    ● New Service Owners (aka Developers and Product Owners)

    View full-size slide

  5. Lisa Seelye
    @thedoh
    Act I: We Made an SLO
    5

    View full-size slide

  6. Lisa Seelye
    @thedoh
    Creating The SLO
    6
    Group Hand Fist Bump by @rawpixel.com at https://www.pexels.com/photo/group-hand-fist-bump-1068523/

    View full-size slide

  7. Lisa Seelye
    @thedoh
    99.95% Monthly Availability
    7
    So… What do you think?

    View full-size slide

  8. Lisa Seelye
    @thedoh
    Conspicuously Absent Objectives
    ● Queue length
    ● Queue consumer count
    8

    View full-size slide

  9. Lisa Seelye
    @thedoh
    The Reaction
    “We think you should have queue length as an SLO.”
    9

    View full-size slide

  10. Lisa Seelye
    @thedoh
    Act II: All About Queue Length
    10

    View full-size slide

  11. Lisa Seelye
    @thedoh
    Causes of long queues - Capacity
    11
    Rush Hour by msvg at https://www.flickr.com/photos/msvg/4476789745/ CC-BY 2.0 / cropped

    View full-size slide

  12. Lisa Seelye
    @thedoh
    Causes of long queues - Usage Spike
    12
    Image from NewRelic.com for FreshBooks (used with permission)

    View full-size slide

  13. Lisa Seelye
    @thedoh
    Causes of long queues - Buggy Deploy
    13
    Bugs by searleb at https://www.flickr.com/photos/searleb/3122477836/ CC-BY 2.0 / cropped

    View full-size slide

  14. Lisa Seelye
    @thedoh
    Causes of long queues - Normal Growth
    14
    Landscape Photography of Pavement Road by @jc-estrada-341132 at
    https://www.pexels.com/photo/landscape-photography-of-pavement-road-1046606/ CC0 / cropped

    View full-size slide

  15. Lisa Seelye
    @thedoh
    Causes of long queues - RabbitMQ Did It!
    15
    Images from FreshBooks

    View full-size slide

  16. Lisa Seelye
    @thedoh
    The Value of Queue Length
    ● We might not have enough capacity
    ● We might be a problem with the queue services
    ● Is RabbitMQ well-provisioned?
    16

    View full-size slide

  17. Lisa Seelye
    @thedoh
    Act III: Asking Questions
    17

    View full-size slide

  18. Lisa Seelye
    @thedoh
    Key Goals
    ● Quicker on-call alert handling
    ● Easier capacity planning
    ● Sets service expectations
    ● Hold ourselves accountable
    18

    View full-size slide

  19. Lisa Seelye
    @thedoh
    Ask Direct Questions
    ● Specific questions have specific answers
    ● Usually service specific
    ● “Why is that important”
    19

    View full-size slide

  20. Lisa Seelye
    @thedoh
    What To Ask? - Is My Service Healthy?
    20
    No more use... by smithser at https://www.flickr.com/photos/smithser/3434266313 CC-BY 2.0 / cropped

    View full-size slide

  21. Lisa Seelye
    @thedoh
    What To Ask? - Service Misuse?
    21
    DSC_1607 by justinbaeder at https://www.flickr.com/photos/justinbaeder/5317820857 CC-BY-2.0 / cropped

    View full-size slide

  22. Lisa Seelye
    @thedoh
    22
    White Pocket Watch With Gold-colored Frame on Brown Wooden Board by @iseeghoststoo at
    https://www.pexels.com/photo/white-pocket-watch-with-gold-colored-frame-on-brown-wooden-board-1010513/ CC0 / cropped
    What To Ask? - Customer Wait Time?

    View full-size slide

  23. Lisa Seelye
    @thedoh
    23
    The big queue at an ATM in Masalli, Azerbaijan by Ds02006 at https://commons.wikimedia.org/wiki/File:ATM_Masalli.jpg / Public Domain / cropped
    What To Ask? - Consumer Throughput?

    View full-size slide

  24. Lisa Seelye
    @thedoh
    24
    King's Highway 12 - Ontario by dougtone at https://www.flickr.com/photos/dougtone/9190014238/ / CC-SA 2.0 / cropped
    What To Ask? - Violating Queue Limits?

    View full-size slide

  25. Lisa Seelye
    @thedoh
    Wait, Did You
    Just Suggest
    Queue Length??
    25

    View full-size slide

  26. Lisa Seelye
    @thedoh
    We Have Our Questions … Now What?
    26
    Stealthy Cosmo by pargon at
    https://www.flickr.com/photos/pargon/2381366401 / CC-BY 2.0 / cropped

    View full-size slide

  27. Lisa Seelye
    @thedoh
    Epilogue
    27

    View full-size slide

  28. Lisa Seelye
    @thedoh
    Looking Back
    ● Unfamiliar with RabbitMQ instrumentation
    ● Correlated queue length with problem
    ● Alert fatigue :(
    28

    View full-size slide

  29. Lisa Seelye
    @thedoh
    Key Lessons
    29
    ● Old ways weren’t working
    ● Be open to a discussion with people

    View full-size slide

  30. Lisa Seelye
    @thedoh
    One Last Thing...
    30
    ● SLOs aren’t just for software - Think Customer Support

    View full-size slide

  31. Lisa Seelye
    @thedoh
    The End
    31
    SLOs sound cool? Learn More in Google’s SRE Book (Ch. 4)
    https://landing.google.com/sre/book.html
    Special thank you to FreshBooks
    For granting permission to give this
    talk.

    View full-size slide