Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero to Capacity Planning, There and back again

Ines Sombra
December 13, 2016

Zero to Capacity Planning, There and back again

Ines Sombra

December 13, 2016
Tweet

More Decks by Ines Sombra

Other Decks in Technology

Transcript

  1. Defining Capacity planning Measuring, planning, & managing system growth Determines

    what your system needs & when From the observation of actual traffic. Use current performance as baseline for predictions Must happen regardless of what you might optimize in the future
  2. I Rule! Evaluates weekly global POPs performance & makes projections

    Weekly plublishes capacity performance report Plans for our physical capacity & transit capacity Meet Catharine
  3. Planning Our Capacity Contextual metrics - Network Capacity (Gb) 


    - Ordered Network Capability (Gb) 
 - Planned Network Capacity (Gb)
 - RPS Capacity (k) 
 - Network peak (Gb) 
 - RPS peak (k) 
 - Site CPU Peak (%) 
 - Network Utilization (%) Over 30%: flagged, Over 70%: Red status
  4. Fastly Insights Our ability to correctly plan for capacity is

    critical to our bottom line Capacity doesn’t just involve hardware; software & transfer optimizations matter People affect capacity
  5. ARE WE RIGHT NOW? We have to be this fast

    & reliable 
 X per second & Y% Uptime MEASURE HOW/RELIABLE WE ARE HARDWARE SOFTWARE ARCHITECTURE CHANGE / ADD / REMOVE FIGURE OUT HOW TO STAY FAST/RELIABLE ENOUGH Yes! No! Allspaw's Wisdom From The Art of Capacity Planning
  6. System’s Ceiling: critical level of a resource that cannot be

    crossed without failure. Find yours Another form of Capacity Planning: Controlled load testing Predictions = ceilings + historical data Allspaw's Wisdom
  7. Allspaw's Wisdom System architecture can affect your ability to add

    capacity Identify & track your application’s metrics Tying metrics to user behavior is helpful If you don’t have ways to measure your current capacity you can’t plan
  8. Unexpected Challenges The goal when adding capacity is no service

    disruption Localhost is the goddamn devil Gap from metric/graph to insight can be huge Slowness is the nemesis of distributed system
  9. more Insights Capacity tied to murky organizational structure is both

    good & bad (but mostly bad) Mind your system dependencies: practice defensive system design & architecture New SLAs can be tricky CAPACITY PLANNING ALERTING MONITORING
  10. more Insights Possible to have plenty of capacity and a

    slow site nonetheless Projections & curve fitting are guesses Keep track of API calls & their rates Always gonna be spikes & hiccups. Take the bad with the good & plan for it
  11. TL;DR Is a process not a one time event Pushes

    you to better understand your system, its capacity & its boundaries - that is good! Proactivity is best Capacity planning Request lifecycle gets tricky System boundaries, dependencies & SLAs must be discussed Your system’s capacity may bound other systems capacity Distributed systems
  12. github.com/Randommood/ZerotoCapacityPlanning Special Thanks to: Catharine Strauss, Alan Kasindorf, Matt Whiteley,

    Caitie McCaffrey, Thom Mahoney, Mike O’Neill, Devon O’Dell, Katherine Daniels, Nathan Taylor, Bruce Spang, and Greg Bako Thank you !