Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SLO Creation and You: Or, How We Learned to Stop Worrying and Love the Queue Length

Lisa Seelye
October 24, 2018

SLO Creation and You: Or, How We Learned to Stop Worrying and Love the Queue Length

Similar material to DevOps Days, but updated for ExploreTech 2018

Lisa Seelye

October 24, 2018
Tweet

More Decks by Lisa Seelye

Other Decks in Technology

Transcript

  1. Lisa Seelye
    @thedoh
    SLO Creation and You
    Or: How We Learned to Stop Worrying and Love the Queue Length
    1

    View Slide

  2. Lisa Seelye
    @thedoh
    About Me
    2
    By @burst at https://www.pexels.com/photo/architecture-bridge-building-business-374754/ CC0

    View Slide

  3. Lisa Seelye
    @thedoh
    About Our Story
    3
    Empty Theater Seats by @jamie-fernandez-201894 at
    https://www.pexels.com/photo/empty-theater-seats-758976/ / CC0 / cropped

    View Slide

  4. Lisa Seelye
    @thedoh
    Our Cast of Characters
    4
    ● Service Level Objectives (SLOs)
    ○ Service Level Agreement (SLAs) as the evil twin
    ● RabbitMQ
    ● Service Ownership
    ● New Service Owners (aka Developers and Product Owners)

    View Slide

  5. Lisa Seelye
    @thedoh
    Act I: We Made an SLO
    5

    View Slide

  6. Lisa Seelye
    @thedoh
    Creating The SLO
    6
    Group Hand Fist Bump by @rawpixel.com at https://www.pexels.com/photo/group-hand-fist-bump-1068523/

    View Slide

  7. Lisa Seelye
    @thedoh
    99.95% Monthly Availability
    7
    So… What do you think?

    View Slide

  8. Lisa Seelye
    @thedoh
    Conspicuously Absent Objectives
    ● Queue length
    ● Queue consumer count
    8

    View Slide

  9. Lisa Seelye
    @thedoh
    The Reaction
    “We think you should have queue length as an SLO.”
    9

    View Slide

  10. Lisa Seelye
    @thedoh
    Act II: All About Queue Length
    10

    View Slide

  11. Lisa Seelye
    @thedoh
    Causes of long queues - Capacity
    11
    Rush Hour by msvg at https://www.flickr.com/photos/msvg/4476789745/ CC-BY 2.0 / cropped

    View Slide

  12. Lisa Seelye
    @thedoh
    Causes of long queues - Usage Spike
    12
    Image from NewRelic.com for FreshBooks (used with permission)

    View Slide

  13. Lisa Seelye
    @thedoh
    Causes of long queues - Buggy Deploy
    13
    Bugs by searleb at https://www.flickr.com/photos/searleb/3122477836/ CC-BY 2.0 / cropped

    View Slide

  14. Lisa Seelye
    @thedoh
    Causes of long queues - Normal Growth
    14
    Landscape Photography of Pavement Road by @jc-estrada-341132 at
    https://www.pexels.com/photo/landscape-photography-of-pavement-road-1046606/ CC0 / cropped

    View Slide

  15. Lisa Seelye
    @thedoh
    Causes of long queues - RabbitMQ Did It!
    15
    Images from FreshBooks

    View Slide

  16. Lisa Seelye
    @thedoh
    The Value of Queue Length
    ● We might not have enough capacity
    ● We might be a problem with the queue services
    ● Is RabbitMQ well-provisioned?
    16

    View Slide

  17. Lisa Seelye
    @thedoh
    Act III: Asking Questions
    17

    View Slide

  18. Lisa Seelye
    @thedoh
    Key Goals
    ● Quicker on-call alert handling
    ● Easier capacity planning
    ● Sets service expectations
    ● Hold ourselves accountable
    18

    View Slide

  19. Lisa Seelye
    @thedoh
    Ask Direct Questions
    ● Specific questions have specific answers
    ● Usually service specific
    ● “Why is that important”
    19

    View Slide

  20. Lisa Seelye
    @thedoh
    What To Ask? - Is My Service Healthy?
    20
    No more use... by smithser at https://www.flickr.com/photos/smithser/3434266313 CC-BY 2.0 / cropped

    View Slide

  21. Lisa Seelye
    @thedoh
    What To Ask? - Service Misuse?
    21
    DSC_1607 by justinbaeder at https://www.flickr.com/photos/justinbaeder/5317820857 CC-BY-2.0 / cropped

    View Slide

  22. Lisa Seelye
    @thedoh
    22
    White Pocket Watch With Gold-colored Frame on Brown Wooden Board by @iseeghoststoo at
    https://www.pexels.com/photo/white-pocket-watch-with-gold-colored-frame-on-brown-wooden-board-1010513/ CC0 / cropped
    What To Ask? - Customer Wait Time?

    View Slide

  23. Lisa Seelye
    @thedoh
    23
    The big queue at an ATM in Masalli, Azerbaijan by Ds02006 at https://commons.wikimedia.org/wiki/File:ATM_Masalli.jpg / Public Domain / cropped
    What To Ask? - Consumer Throughput?

    View Slide

  24. Lisa Seelye
    @thedoh
    24
    King's Highway 12 - Ontario by dougtone at https://www.flickr.com/photos/dougtone/9190014238/ / CC-SA 2.0 / cropped
    What To Ask? - Violating Queue Limits?

    View Slide

  25. Lisa Seelye
    @thedoh
    Wait, Did You
    Just Suggest
    Queue Length??
    25

    View Slide

  26. Lisa Seelye
    @thedoh
    We Have Our Questions … Now What?
    26
    Stealthy Cosmo by pargon at
    https://www.flickr.com/photos/pargon/2381366401 / CC-BY 2.0 / cropped

    View Slide

  27. Lisa Seelye
    @thedoh
    Epilogue
    27

    View Slide

  28. Lisa Seelye
    @thedoh
    Looking Back
    ● Unfamiliar with RabbitMQ instrumentation
    ● Correlated queue length with problem
    ● Alert fatigue :(
    28

    View Slide

  29. Lisa Seelye
    @thedoh
    Key Lessons
    29
    ● Old ways weren’t working
    ● Be open to a discussion with people

    View Slide

  30. Lisa Seelye
    @thedoh
    One Last Thing...
    30
    ● SLOs aren’t just for software - Think Customer Support

    View Slide

  31. Lisa Seelye
    @thedoh
    The End
    31
    SLOs sound cool? Learn More in Google’s SRE Book (Ch. 4)
    https://landing.google.com/sre/book.html
    Special thank you to FreshBooks
    For granting permission to give this
    talk.

    View Slide