$30 off During Our Annual Pro Sale. View Details »

Future-proofing Production Systems.

kavya
October 20, 2017

Future-proofing Production Systems.

How does your system perform under load? What’re the bottlenecks, and how does it fail at its limits?
How do you stay ahead as your system evolves and its workload grows?

In this talk, we’ll explore strategies to prepare systems for flux and scale. From Facebook’s Kraken that provides shadow traffic, to the custom load simulator we built at Samsara, we’ll discuss how to go about understanding your systems as they run today, and planning for how they will tomorrow.

kavya

October 20, 2017
Tweet

More Decks by kavya

Other Decks in Programming

Transcript

  1. Future-proofing
    Production Systems
    @kavya719

    View Slide

  2. kavya

    View Slide

  3. analyzing
    of the
    performance
    of systems

    View Slide

  4. performance
    capacity
    • What’s the additional load the system can support, 

    without degrading response time?
    • What’re the system utilization bottlenecks?
    • What’s the impact of a change on response time,

    maximum throughput?
    • How many additional servers to support 10x load?
    • Is the system over-provisioned?

    View Slide

  5. more robust, performant, scalable
    …use prod to make prod better.

    View Slide

  6. more robust, performant, scalable
    …use prod to make prod better.
    A/ B testing, canaries and ramped deploys.
    chaos engineering.
    stressing the system.
    empirically determine
    performance characteristics, bottlenecks.

    View Slide

  7. OrgSim etc.
    a standard load simulator
    Little’s law
    Kraken
    a fancy load “simulator”

    utilization law, the USL
    stepping back
    the sweet middle ground

    View Slide

  8. Kraken

    View Slide

  9. • Facebook’s load “simulator”.

    In use in production since ~2013.

    • Used to determine a system’s capacity.
    • And to identify and resolve utilization bottlenecks.

    • Allowed them to increase Facebook’s capacity 

    by over 20% using the same hardware!
    kraken

    View Slide

  10. • Facebook’s load “simulator”.

    In use in production since ~2013.

    • Used to determine a system’s capacity.
    • And to identify and resolve utilization bottlenecks.

    • Allowed them to increase Facebook’s capacity 

    by over 20% using the same hardware!
    kraken
    maximum throughput,
    subject to a response time constraint.

    View Slide

  11. • Facebook’s load “simulator”.

    In use in production since ~2013.

    • Used to determine a system’s capacity.
    • And to identify and resolve utilization bottlenecks.

    • Allowed them to increase Facebook’s capacity 

    by over 20% using the same hardware!
    kraken
    maximum throughput,
    subject to a response time constraint.

    View Slide

  12. • Facebook’s load “simulator”.

    In use in production since ~2013.

    • Used to determine a system’s capacity.
    • And to identify and resolve utilization bottlenecks.

    • Allowed them to increase Facebook’s capacity 

    by over 20% using the same hardware!
    kraken
    maximum throughput,
    subject to a response time constraint.

    View Slide

  13. the model
    stateless servers
    that serve requests without using sticky sessions/ server affinity.
    load can be controlled by re-routing requests
    for example, this does not apply to a global message queue.
    downstream services respond to upstream service load shifts
    for example, a web server querying a database.

    View Slide

  14. load generation
    need a representative workload…use live traffic!
    traffic shifting:

    increase the fraction of traffic to a region, cluster, server, by

    adjusting the weights that control load balancing,
    monitoring
    need reliable metrics that track the health of the system.
    p99 response time
    HTTP error rate
    user experience
    ]
    CPU utilization
    connections, queue length
    safety
    ]

    View Slide

  15. response time
    threshold
    capacity
    …is this good or is there a bottleneck?
    let’s run it!

    View Slide

  16. interlude: performance modeling
    assume no upstream saturation, so service time constant i.e.

    response time ∝ queueing delay.
    model a web server as a queueing system.
    queueing delay + service time
    response time =
    Step I: single server capacity

    View Slide

  17. utilization = throughput * service time (Utilization Law)
    queueing delay increases 

    (non-linearly);
    so, response time.
    throughput increases
    utilization increases
    throughput
    response time
    “busyness”

    View Slide

  18. utilization = throughput * service time (Utilization Law)
    throughput
    “busyness”
    queueing delay increases 

    (non-linearly);
    so, response time.
    throughput increases
    utilization increases

    View Slide

  19. Iff linear scaling,
    cluster of N servers’ capacity = single server capacity * N
    Step II: cluster capacity
    theoretical cluster capacity
    throughput
    concurrency
    • contention penalty

    due to queueing for shared resources
    • consistency penalty

    due to increase in service time
    Universal Scalability Law (USL):
    target cluster capacity should account for this.
    … but systems don’t scale linearly.

    View Slide

  20. Facebook sets target cluster capacity = 93% of theoretical.
    …is this good or is there a bottleneck?

    View Slide

  21. cluster capacity is ~90% of theoretical,
    so there’s a bottleneck to fix!
    Facebook sets target cluster capacity = 93% of theoretical.

    View Slide

  22. bottlenecks uncovered

    • cache bottleneck
    • network saturation
    • poor load balancing
    • misconfiguration
    Also, insufficient capacity 

    i.e. no bottlenecks per-se, but organic growth.
    …so, can we have it too?

    View Slide

  23. OrgSim etc.

    View Slide

  24. load generation
    Run a configurable number of virtual clients.
    A virtual client sends/receives in a loop.


    Use synthetic workloads.

    OrgSim’s load profile is based on historical data.
    monitoring
    external to the load simulator system.
    We use Datadog alerts on metric thresholds.

    View Slide

  25. gotchas
    • synthetic workloads may not be representative of actual.

    View Slide

  26. number of virtual clients (N) = 1, …, 100
    response time
    concurrency (N)
    wrong shape
    for response time curve!
    should be
    (from the USL)
    concurrency (N)
    response time
    … load simulator hit a bottleneck!

    View Slide

  27. gotchas
    • synthetic workloads may not be representative of actual.
    • load simulator may hit a bottleneck!
    concurrency = throughput * response time
    Little’s Law:
    number of virtual clients
    actually running

    View Slide

  28. stepping back

    View Slide

  29. Case for performance testing in production

    empiricism is queen.
    …performance testing or modeling?
    yes.
    Case for performance modeling
    expectations better than no expectations.

    View Slide

  30. @kavya719
    speakerdeck.com/kavya719/future-proofing-production-systems
    Special thanks to Eben Freeman for reading drafts of this.
    Kraken

    https://research.fb.com/publications/kraken-leveraging-live-traffic-tests-to-identify-
    and-resolve-resource-utilization-bottlenecks-in-large-scale-web-services/
    Performance modeling

    Performance Modeling and Design of Computer Systems, Mor Harchol-Balter
    How to Quantify Scalability, Neil Gunther:

    http://www.perfdynamics.com/Manifesto/USLscalability.html

    View Slide

  31. View Slide

  32. throughput
    latency
    non-linear responses to load
    throughput
    concurrency
    non-linear scaling

    View Slide

  33. throughput
    latency
    non-linear responses to load
    throughput
    concurrency
    non-linear scaling
    microservices:
    systems are complex
    continuous deploys:

    systems are in flux

    View Slide

  34. load generation
    need a representative workload.
    …use live traffic.
    traffic shifting
    profile (read, write requests)
    arrival pattern including traffic bursts
    capture and replay

    View Slide

  35. edge weight cluster weight server weight
    adjust weights that control load balancing,
    to increase the fraction of traffic to a cluster, region, server.
    traffic shifting

    View Slide

  36. monitoring
    need reliable metrics that track the health of the system.
    p99 response time
    HTTP error rate
    user experience
    ]
    CPU utilization

    memory utilization
    connections, queue length
    safety
    ]

    View Slide

  37. let’s run it!
    kraken
    ]

    View Slide

  38. the cloud (AWS)
    industry
    site
    devices web
    dashboard
    user’s browser
    samsara

    View Slide

  39. the cloud (AWS)
    industry
    site
    hubs data
    processors
    storage
    devices frontend
    servers
    web
    dashboard
    user’s browser
    websocket

    sticky sessions
    samsara

    View Slide