Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero to Capacity Planning, There and back again

Ines Sombra
December 13, 2016

Zero to Capacity Planning, There and back again

Ines Sombra

December 13, 2016
Tweet

More Decks by Ines Sombra

Other Decks in Technology

Transcript

  1. From Zero To Capacity Planning 2016 Edition!

  2. @Randommood INES 
 Sombra

  3. Globally distributed and Highly available

  4. NOT AN INFRA person

  5. INSTRUMENT MONITOR & ALERT PLAN & PREDICT Why care? About

    Capacity planning ✨ ✨
  6. Capacity planning 
 101

  7. Defining Capacity planning Measuring, planning, & managing system growth Determines

    what your system needs & when From the observation of actual traffic. Use current performance as baseline for predictions Must happen regardless of what you might optimize in the future
  8. a Fastly POP

  9. I Rule! Evaluates weekly global POPs performance & makes projections

    Weekly plublishes capacity performance report Plans for our physical capacity & transit capacity Meet Catharine
  10. Planning Our Capacity Contextual metrics - Network Capacity (Gb) 


    - Ordered Network Capability (Gb) 
 - Planned Network Capacity (Gb)
 - RPS Capacity (k) 
 - Network peak (Gb) 
 - RPS peak (k) 
 - Site CPU Peak (%) 
 - Network Utilization (%) Over 30%: flagged, Over 70%: Red status
  11. Fastly Insights Our ability to correctly plan for capacity is

    critical to our bottom line Capacity doesn’t just involve hardware; software & transfer optimizations matter People affect capacity
  12. allspaW’s Admiration society

  13. ARE WE RIGHT NOW? We have to be this fast

    & reliable 
 X per second & Y% Uptime MEASURE HOW/RELIABLE WE ARE HARDWARE SOFTWARE ARCHITECTURE CHANGE / ADD / REMOVE FIGURE OUT HOW TO STAY FAST/RELIABLE ENOUGH Yes! No! Allspaw's Wisdom From The Art of Capacity Planning
  14. System’s Ceiling: critical level of a resource that cannot be

    crossed without failure. Find yours Another form of Capacity Planning: Controlled load testing Predictions = ceilings + historical data Allspaw's Wisdom
  15. Allspaw's Wisdom System architecture can affect your ability to add

    capacity Identify & track your application’s metrics Tying metrics to user behavior is helpful If you don’t have ways to measure your current capacity you can’t plan
  16. & Putting things in practice Findings

  17. Unexpected Challenges The goal when adding capacity is no service

    disruption Localhost is the goddamn devil Gap from metric/graph to insight can be huge Slowness is the nemesis of distributed system
  18. more Insights Capacity tied to murky organizational structure is both

    good & bad (but mostly bad) Mind your system dependencies: practice defensive system design & architecture New SLAs can be tricky CAPACITY PLANNING ALERTING MONITORING
  19. more Insights Possible to have plenty of capacity and a

    slow site nonetheless Projections & curve fitting are guesses Keep track of API calls & their rates Always gonna be spikes & hiccups. Take the bad with the good & plan for it
  20. TL;DR Is a process not a one time event Pushes

    you to better understand your system, its capacity & its boundaries - that is good! Proactivity is best Capacity planning Request lifecycle gets tricky System boundaries, dependencies & SLAs must be discussed Your system’s capacity may bound other systems capacity Distributed systems
  21. github.com/Randommood/ZerotoCapacityPlanning Special Thanks to: Catharine Strauss, Alan Kasindorf, Matt Whiteley,

    Caitie McCaffrey, Thom Mahoney, Mike O’Neill, Devon O’Dell, Katherine Daniels, Nathan Taylor, Bruce Spang, and Greg Bako Thank you !