Slide 1

Slide 1 text

Future-proofing Production Systems @kavya719

Slide 2

Slide 2 text

kavya

Slide 3

Slide 3 text

analyzing of the performance of systems

Slide 4

Slide 4 text

performance capacity • What’s the additional load the system can support, 
 without degrading response time? • What’re the system utilization bottlenecks? • What’s the impact of a change on response time,
 maximum throughput? • How many additional servers to support 10x load? • Is the system over-provisioned?

Slide 5

Slide 5 text

more robust, performant, scalable …use prod to make prod better.

Slide 6

Slide 6 text

more robust, performant, scalable …use prod to make prod better. A/ B testing, canaries and ramped deploys. chaos engineering. stressing the system. empirically determine performance characteristics, bottlenecks.

Slide 7

Slide 7 text

OrgSim etc. a standard load simulator Little’s law Kraken a fancy load “simulator”
 utilization law, the USL stepping back the sweet middle ground

Slide 8

Slide 8 text

Kraken

Slide 9

Slide 9 text

• Facebook’s load “simulator”.
 In use in production since ~2013.
 • Used to determine a system’s capacity. • And to identify and resolve utilization bottlenecks.
 • Allowed them to increase Facebook’s capacity 
 by over 20% using the same hardware! kraken

Slide 10

Slide 10 text

• Facebook’s load “simulator”.
 In use in production since ~2013.
 • Used to determine a system’s capacity. • And to identify and resolve utilization bottlenecks.
 • Allowed them to increase Facebook’s capacity 
 by over 20% using the same hardware! kraken maximum throughput, subject to a response time constraint.

Slide 11

Slide 11 text

• Facebook’s load “simulator”.
 In use in production since ~2013.
 • Used to determine a system’s capacity. • And to identify and resolve utilization bottlenecks.
 • Allowed them to increase Facebook’s capacity 
 by over 20% using the same hardware! kraken maximum throughput, subject to a response time constraint.

Slide 12

Slide 12 text

• Facebook’s load “simulator”.
 In use in production since ~2013.
 • Used to determine a system’s capacity. • And to identify and resolve utilization bottlenecks.
 • Allowed them to increase Facebook’s capacity 
 by over 20% using the same hardware! kraken maximum throughput, subject to a response time constraint.

Slide 13

Slide 13 text

the model stateless servers that serve requests without using sticky sessions/ server affinity. load can be controlled by re-routing requests for example, this does not apply to a global message queue. downstream services respond to upstream service load shifts for example, a web server querying a database.

Slide 14

Slide 14 text

load generation need a representative workload…use live traffic! traffic shifting:
 increase the fraction of traffic to a region, cluster, server, by
 adjusting the weights that control load balancing, monitoring need reliable metrics that track the health of the system. p99 response time HTTP error rate user experience ] CPU utilization connections, queue length safety ]

Slide 15

Slide 15 text

response time threshold capacity …is this good or is there a bottleneck? let’s run it!

Slide 16

Slide 16 text

interlude: performance modeling assume no upstream saturation, so service time constant i.e.
 response time ∝ queueing delay. model a web server as a queueing system. queueing delay + service time response time = Step I: single server capacity

Slide 17

Slide 17 text

utilization = throughput * service time (Utilization Law) queueing delay increases 
 (non-linearly); so, response time. throughput increases utilization increases throughput response time “busyness”

Slide 18

Slide 18 text

utilization = throughput * service time (Utilization Law) throughput “busyness” queueing delay increases 
 (non-linearly); so, response time. throughput increases utilization increases

Slide 19

Slide 19 text

Iff linear scaling, cluster of N servers’ capacity = single server capacity * N Step II: cluster capacity theoretical cluster capacity throughput concurrency • contention penalty
 due to queueing for shared resources • consistency penalty
 due to increase in service time Universal Scalability Law (USL): target cluster capacity should account for this. … but systems don’t scale linearly.

Slide 20

Slide 20 text

Facebook sets target cluster capacity = 93% of theoretical. …is this good or is there a bottleneck?

Slide 21

Slide 21 text

cluster capacity is ~90% of theoretical, so there’s a bottleneck to fix! Facebook sets target cluster capacity = 93% of theoretical.

Slide 22

Slide 22 text

bottlenecks uncovered
 • cache bottleneck • network saturation • poor load balancing • misconfiguration Also, insufficient capacity 
 i.e. no bottlenecks per-se, but organic growth. …so, can we have it too?

Slide 23

Slide 23 text

OrgSim etc.

Slide 24

Slide 24 text

load generation Run a configurable number of virtual clients. A virtual client sends/receives in a loop.
 
 Use synthetic workloads.
 OrgSim’s load profile is based on historical data. monitoring external to the load simulator system. We use Datadog alerts on metric thresholds.

Slide 25

Slide 25 text

gotchas • synthetic workloads may not be representative of actual.

Slide 26

Slide 26 text

number of virtual clients (N) = 1, …, 100 response time concurrency (N) wrong shape for response time curve! should be (from the USL) concurrency (N) response time … load simulator hit a bottleneck!

Slide 27

Slide 27 text

gotchas • synthetic workloads may not be representative of actual. • load simulator may hit a bottleneck! concurrency = throughput * response time Little’s Law: number of virtual clients actually running

Slide 28

Slide 28 text

stepping back

Slide 29

Slide 29 text

Case for performance testing in production
 empiricism is queen. …performance testing or modeling? yes. Case for performance modeling expectations better than no expectations.

Slide 30

Slide 30 text

@kavya719 speakerdeck.com/kavya719/future-proofing-production-systems Special thanks to Eben Freeman for reading drafts of this. Kraken
 https://research.fb.com/publications/kraken-leveraging-live-traffic-tests-to-identify- and-resolve-resource-utilization-bottlenecks-in-large-scale-web-services/ Performance modeling
 Performance Modeling and Design of Computer Systems, Mor Harchol-Balter How to Quantify Scalability, Neil Gunther:
 http://www.perfdynamics.com/Manifesto/USLscalability.html

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

throughput latency non-linear responses to load throughput concurrency non-linear scaling

Slide 33

Slide 33 text

throughput latency non-linear responses to load throughput concurrency non-linear scaling microservices: systems are complex continuous deploys:
 systems are in flux

Slide 34

Slide 34 text

load generation need a representative workload. …use live traffic. traffic shifting profile (read, write requests) arrival pattern including traffic bursts capture and replay

Slide 35

Slide 35 text

edge weight cluster weight server weight adjust weights that control load balancing, to increase the fraction of traffic to a cluster, region, server. traffic shifting

Slide 36

Slide 36 text

monitoring need reliable metrics that track the health of the system. p99 response time HTTP error rate user experience ] CPU utilization
 memory utilization connections, queue length safety ]

Slide 37

Slide 37 text

let’s run it! kraken ]

Slide 38

Slide 38 text

the cloud (AWS) industry site devices web dashboard user’s browser samsara

Slide 39

Slide 39 text

the cloud (AWS) industry site hubs data processors storage devices frontend servers web dashboard user’s browser websocket
 sticky sessions samsara