Zero to Capacity Planning, There and back again

C64a0152c9b0928e62d88f0bb5eb8138?s=47 Ines Sombra
December 13, 2016

Zero to Capacity Planning, There and back again

C64a0152c9b0928e62d88f0bb5eb8138?s=128

Ines Sombra

December 13, 2016
Tweet

Transcript

  1. From Zero To Capacity Planning 2016 Edition!

  2. @Randommood INES 
 Sombra

  3. Globally distributed and Highly available

  4. NOT AN INFRA person

  5. INSTRUMENT MONITOR & ALERT PLAN & PREDICT Why care? About

    Capacity planning ✨ ✨
  6. Capacity planning 
 101

  7. Defining Capacity planning Measuring, planning, & managing system growth Determines

    what your system needs & when From the observation of actual traffic. Use current performance as baseline for predictions Must happen regardless of what you might optimize in the future
  8. a Fastly POP

  9. I Rule! Evaluates weekly global POPs performance & makes projections

    Weekly plublishes capacity performance report Plans for our physical capacity & transit capacity Meet Catharine
  10. Planning Our Capacity Contextual metrics - Network Capacity (Gb) 


    - Ordered Network Capability (Gb) 
 - Planned Network Capacity (Gb)
 - RPS Capacity (k) 
 - Network peak (Gb) 
 - RPS peak (k) 
 - Site CPU Peak (%) 
 - Network Utilization (%) Over 30%: flagged, Over 70%: Red status
  11. Fastly Insights Our ability to correctly plan for capacity is

    critical to our bottom line Capacity doesn’t just involve hardware; software & transfer optimizations matter People affect capacity
  12. allspaW’s Admiration society

  13. ARE WE RIGHT NOW? We have to be this fast

    & reliable 
 X per second & Y% Uptime MEASURE HOW/RELIABLE WE ARE HARDWARE SOFTWARE ARCHITECTURE CHANGE / ADD / REMOVE FIGURE OUT HOW TO STAY FAST/RELIABLE ENOUGH Yes! No! Allspaw's Wisdom From The Art of Capacity Planning
  14. System’s Ceiling: critical level of a resource that cannot be

    crossed without failure. Find yours Another form of Capacity Planning: Controlled load testing Predictions = ceilings + historical data Allspaw's Wisdom
  15. Allspaw's Wisdom System architecture can affect your ability to add

    capacity Identify & track your application’s metrics Tying metrics to user behavior is helpful If you don’t have ways to measure your current capacity you can’t plan
  16. & Putting things in practice Findings

  17. Unexpected Challenges The goal when adding capacity is no service

    disruption Localhost is the goddamn devil Gap from metric/graph to insight can be huge Slowness is the nemesis of distributed system
  18. more Insights Capacity tied to murky organizational structure is both

    good & bad (but mostly bad) Mind your system dependencies: practice defensive system design & architecture New SLAs can be tricky CAPACITY PLANNING ALERTING MONITORING
  19. more Insights Possible to have plenty of capacity and a

    slow site nonetheless Projections & curve fitting are guesses Keep track of API calls & their rates Always gonna be spikes & hiccups. Take the bad with the good & plan for it
  20. TL;DR Is a process not a one time event Pushes

    you to better understand your system, its capacity & its boundaries - that is good! Proactivity is best Capacity planning Request lifecycle gets tricky System boundaries, dependencies & SLAs must be discussed Your system’s capacity may bound other systems capacity Distributed systems
  21. github.com/Randommood/ZerotoCapacityPlanning Special Thanks to: Catharine Strauss, Alan Kasindorf, Matt Whiteley,

    Caitie McCaffrey, Thom Mahoney, Mike O’Neill, Devon O’Dell, Katherine Daniels, Nathan Taylor, Bruce Spang, and Greg Bako Thank you !