Geoff Gerrietts - Diving into the Wreck: a postmortem look at real-world performance

Diving into the Wreck

Geoff Gerrietts @ggerrietts Development Manager @AppNeta 16 years of Python
Rehabilitated Poet Geoff Gerrietts Geoff Gerrietts @ggerrietts @ggerrietts • Geoff Gerrietts • Development Manager at AppNeta • Pythonista for 16 years; developer for ~20 • Manager, devops, programmer, webmaster, technical writer • Before that I studied English • I like to make not-really-joking jokes about how writing poetry and programming are similar.

Geoff Gerrietts @ggerrietts First having read the book of myths,
and loaded the camera, and checked the edge of the knife-blade, I put on the body-armor of black rubber the absurd flippers the grave and awkward mask. — Adrienne Rich, “Diving into the Wreck” • Title from a poem • Adrienne Rich • Metaphor of salvage diving • Reflection, analysis • The profound weight of the waters of time • Shining beams of light into tattered evidence

Thinking About Performance • So let’s shine some lights into
application performance.

Geoff Gerrietts @ggerrietts Performant Incantations and Rules of Thumb •
When we talk about performance, conversations often start with folk wisdom

Geoff Gerrietts @ggerrietts Concatenate & Minify Geoff Gerrietts @ggerrietts •
Like, concatenate and minify your Javascript and CSS files.

Geoff Gerrietts @ggerrietts Serve Assets Separately Geoff Gerrietts @ggerrietts •
And don’t serve assets through your application, arrange to publish them separately. • Maybe use a CDN.

Geoff Gerrietts @ggerrietts Hardware is Cheap, People are Expensive Geoff
Gerrietts @ggerrietts • Throw money at the problem • Not just true in the web world

Geoff Gerrietts @ggerrietts “”.join([“abc”, “def”, “ghi”]) not “abc” + “def”
+ “ghi” Source: https://wiki.python.org/moin/PythonSpeed/PerformanceTips • Python specific performance hacks • Join lists rather than concatenate • Which may be dated advice?

Geoff Gerrietts @ggerrietts def spam(eggs, randrange=random.randrange): while True: yield eggs[randrange(len(eggs))]
Source: https://wiki.python.org/moin/PythonSpeed/PerformanceTips • Bind local names to prevent lookups, especially in tight loops

Geoff Gerrietts @ggerrietts Source: http://bigocheatsheet.com/ • No discussion of performance
is complete without mentioning Big O

Geoff Gerrietts @ggerrietts None of this is wrong. • All
good advice, and we should all be aware of what we’re doing.

Geoff Gerrietts @ggerrietts None of this is wrong. But it
rarely addresses the real issue. • But it’s all targeted at solving very specific problems • And most problems aren’t those problems • Dave Grothe story

Thinking Critically About Performance • So let’s think a little
more analytically about performance.

Geoff Gerrietts @ggerrietts nginx gunicorn Flask nginx gunicorn Flask Postgres
web.wreck.tlys.us aux.wreck.tlys.us db.wreck.tlys.us • This is the architecture that I used for all the sample code in this talk • It’s a fairly typical architecture, though there are lots of variations • (Walk through) • When we talk about performance, we talk about latency • At a high level, the latency we care about is how long it takes from our request, to seeing the response • If that takes 100ms (which is not bad), that 100ms accrues as the request traverses the architecture and becomes a response

Geoff Gerrietts @ggerrietts • L1 cache reference: 1ns • Branch
mispredict: 3ns • Mutex lock/unlock: 17ns • Main memory reference: 100ns • SSD random read: 16,000ns (16µs) • Read 1M bytes from SSD: 123,000ns (123µs) • Round-trip in data center: 500,000ns (500µs) • Read 1M bytes from disk: 1Mns (1ms) • Disk seek: 3Mns (3ms) • Round-trip, CA to Amsterdam: 150Mns (150ms) Geoff Gerrietts @ggerrietts Source: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html • A key tool in our kit is knowing where latency comes from. • This is our razor, the thing we use to guide our suspicions. • The “Jeff Dean” numbers, really originate out of Berkeley. • They have an interactive graph where you can see the numbers change over time, shout out to the speed of light

Geoff Gerrietts @ggerrietts • L1 cache reference: 1ns • Branch
mispredict: 3ns • Mutex lock/unlock: 17ns • Main memory reference: 100ns • SSD random read: 16,000ns (16µs) • Read 1M bytes from SSD: 123,000ns (123µs) • Round-trip in data center: 500,000ns (500µs) • Read 1M bytes from disk: 1Mns (1ms) • Disk seek: 3Mns (3ms) • Round-trip, CA to Amsterdam: 150Mns (150ms) Geoff Gerrietts @ggerrietts Source: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html • Python scribbles all over that yellow area • But that’s also where most of the optimization guidelines help • For some applications, those kind of optimizations help a lot. • Not for most applications though, particularly not in the web domain.

Geoff Gerrietts @ggerrietts Geoff Gerrietts @ggerrietts CPU < RAM <
SSD < LAN < HDD < WAN • Roughly, CPU is faster than memory is faster than SSD is faster than LAN is faster than disk is faster than Internet. • Knowing the magnitude on each of those less-thans matters a lot (sometimes 4, sometimes 100) • But still: disk & network incur high latency

Geoff Gerrietts @ggerrietts Monitoring is Everything. • One last thing.
• Last year’s talk, won’t rehash • I’ll be showing a lot of TraceView, because TraceView • Lots of good choices • If you can’t measure it, you can’t know what’s wrong

Geoff Gerrietts @ggerrietts SPOILER ALERT • Usually the database •
Network + disk • Single bottleneck for all requests • Victorian mystery novel, the butler would be named Postgres

So Let’s Dive • So we have reviewed the book
of myths. • Checked our knife and camera • Let’s get wet.

Geoff Gerrietts @ggerrietts • Application performance is complicated • You
might get a view like this

Geoff Gerrietts @ggerrietts • Or a view like this

Geoff Gerrietts @ggerrietts • Or if you’re really doing this
by hand, like this • These aren’t actionable. • You have to identify the problem you want to fix

Geoff Gerrietts @ggerrietts • I like to use freq x
duration, which also might be called cumulative time or total time • Captures really slow outliers • Also captures relatively fast stuff that’s being run a lot • Freq and duration separately can be useful

The Shipwreck-in-a-Bottle Collection • For our dives today, I have
selected a handful of wrecks • These are not the exact shipwrecks whose gunwales I leaped from as they sank • But rather dioramas that I have built

Geoff Gerrietts @ggerrietts The S.S. Truculent Query (Slow Queries)

Geoff Gerrietts @ggerrietts http://192.168.12.34/truculent/smith This is the site in question

Geoff Gerrietts @ggerrietts • This is a bad response time.
• Now, averages can be deceiving. • But they’re not right now

Geoff Gerrietts @ggerrietts • So we know this endpoint is
slow. • Which part is slow? • If you have a tool that will split things out by layer, you can look at that.

Geoff Gerrietts @ggerrietts • Here, a ton of time in
SQLAlchemy

Geoff Gerrietts @ggerrietts • Since the database is our main
suspect, we can also use a slow query log

Geoff Gerrietts @ggerrietts • The slow query log shows us
that one of our queries is taking more than 14 seconds. • Both of these indicators point to a problem with queries

Geoff Gerrietts @ggerrietts • OK so here’s the code. •
Kind of hard to see maybe, but can you see where it’s bad?

Geoff Gerrietts @ggerrietts • It’s kind of bad everywhere in
the highlight.

Geoff Gerrietts @ggerrietts • Using a subquery as its data
source

Geoff Gerrietts @ggerrietts • Unnecessary join

Geoff Gerrietts @ggerrietts • Even a LIKE on an unindexed
field • This query is exaggeratedly bad • But typical incarnations will do one or more of these things • Maybe 1 second instead of 14. • Can fix these with a better query

Geoff Gerrietts @ggerrietts The SS Highly Selective (Excessive Queries) •

Geoff Gerrietts @ggerrietts http://web.wreck.tlys.us/1000queries/jgarcia This is the site in question

Geoff Gerrietts @ggerrietts • So 4.2 is better than our
14s average in the previous example • But it’s not good, especially for stuff that’s this policy-light

Geoff Gerrietts @ggerrietts • Can check latency-by-layer

Geoff Gerrietts @ggerrietts • First thing that jumps out is
holy crap that’s a lot of queries, two orders of magnitude

Geoff Gerrietts @ggerrietts • Query log useful too

Geoff Gerrietts @ggerrietts • Split between a couple offenders

Geoff Gerrietts @ggerrietts • If we look at a profile
or a graph of a transaction • A lot of red

Geoff Gerrietts @ggerrietts • In fact, we can see 106
SQLalchemy extents

Geoff Gerrietts @ggerrietts • OK so here’s the code for
this one. • See the problem? • No? • Where are all those queries coming from?

Geoff Gerrietts @ggerrietts • What about here? Red flags?

Geoff Gerrietts @ggerrietts • Right there. • SQLAlchemy does lazy
traversal of relationships. • Django too. • So the Rolls and Players are being queried separately — once for each game! • Can maybe fix this by “eagerly joining” relationships • Or maybe reconsider the scalability of the design • Another variation on this problem: same query, many times in single transaction

Geoff Gerrietts @ggerrietts The SS Oryx-Antlerson (The Noisy Neighbor) •
Judy

Geoff Gerrietts @ggerrietts http://web.wreck.tlys.us/quiet/153/3432/2d6 This is the site in question

Geoff Gerrietts @ggerrietts • Finally, an endpoint with decent performance
• Clocks in at <30ms

Geoff Gerrietts @ggerrietts • But then at 3:35, it grows
to 150%!

Geoff Gerrietts @ggerrietts • Sometimes looking at a heatmap helps
show how averages are being skewed by outliers • Here everything’s pretty tightly grouped though • One hint: after the break, the cluster is a lot looser

Geoff Gerrietts @ggerrietts • Top is a detailed view into
a fast trace • Bottom is a detailed view into a slow trace • Could do with profiling instead, have done that • Nginx olive, wsgi cyan, sqlalchemy red

Geoff Gerrietts @ggerrietts • Basically the same except a couple
sqlalchemy queries taking longer • 18.11 for first, 53.53 for second — more than 2x time • So, probably the database. Surprised?

Geoff Gerrietts @ggerrietts • Cloudwatch CPU on DB for same
period. • 20% CPU usage pre-spike • 100% at spike

Geoff Gerrietts @ggerrietts • Now, there’s nothing to see in
the code, but I thought I’d show it. • Pretty straightforward • Nothing looks too suspicious, nothing looks like it should change based on time

Geoff Gerrietts @ggerrietts • So let’s go see what’s going
on in the DB • Now, our webserver is 172.31.63.25... • So who’s 172.31.61.237? • Won’t necessarily be identifiable by IP -- might need to hunt by query instead • Sometimes you look at nginx logs instead

Geoff Gerrietts @ggerrietts • The culprit • Can see it
spins up at 3:35

Geoff Gerrietts @ggerrietts • Here’s the code. It might look
familiar. • Once it started up, it beat the snot out of the DB

Geoff Gerrietts @ggerrietts The SS Remembrance Lost (Memory Fragmentation) •

Geoff Gerrietts @ggerrietts http://web.wreck.tlys.us/grenade/8 • This is the site in
question

Geoff Gerrietts @ggerrietts • So, this is a wild-looking graph
• More peaks and valleys than the Rockies

Geoff Gerrietts @ggerrietts • This olive area is nginx. •
If you believe this graph, requests are spending ~30s in nginx!

Geoff Gerrietts @ggerrietts • Also, throughput starts strong then peters
out • Weird!

Geoff Gerrietts @ggerrietts • Here’s a similar graph from a
different test run. • Totally different characteristics! • Only similarity is the weird spike in nginx

Geoff Gerrietts @ggerrietts • Here’s a clue: error rate. •
This graph matches our first test run: request rate falls off, but errors go away.

Geoff Gerrietts @ggerrietts • The second test run • Errors
throughout • One notable gap where nginx spiked

Geoff Gerrietts @ggerrietts • So let’s check the error log.
•

Geoff Gerrietts @ggerrietts • Uh oh.

Geoff Gerrietts @ggerrietts • Okay enough drama • Top graph
is from our first test. • Middle graph and bottom are from our second test. • Both reveal a shifting memory profile • Memory topping out a couple times

Geoff Gerrietts @ggerrietts • If we look in the server
log • OOM killer has been killing processes • Segfaults have been killing processes • OMG!

Geoff Gerrietts @ggerrietts • Here’s the code. • What’s wrong
with what it does? • It seems ok? • But what happens when you pass in a large count?

Geoff Gerrietts @ggerrietts Memory fragmentation • Grossly oversimplifying, Python manages
memory in various pools • I’ve drawn the same pool at three different points in its lifecycle. • At first it’s of modest size • Then grows to accommodate new objects • Then those objects are collected • But the pool is not shrunk.

Geoff Gerrietts @ggerrietts http://web.wreck.tlys.us/grenade/8 ⟶ 1.1% (46M) http://web.wreck.tlys.us/grenade/64 ⟶ 2.6%
(105M) http://web.wreck.tlys.us/grenade/128 ⟶ 3.8% (152M) http://web.wreck.tlys.us/grenade/256 ⟶ 6.1% (247M) http://web.wreck.tlys.us/grenade/512 ⟶12.1% (488M) • Watching this on top is pretty interesting • Each request gobbles up a bunch of RAM • Then settles at the new level • 10 workers, 4GB main memory

Surfacing •

Geoff Gerrietts @ggerrietts DON’T: Aim to improve performance.

Geoff Gerrietts @ggerrietts DO: Identify a problem, and fix it.

Geoff Gerrietts @ggerrietts DON’T: Agonize over optimizations

Geoff Gerrietts @ggerrietts DO: Look at the network. Look at
the DB.

Geoff Gerrietts @ggerrietts Thanks! Questions/comments: [email protected] @ggerrietts https://github.com/ggerrietts/the-wreck

Geoff Gerrietts - Diving into the Wreck: a post...

Geoff Gerrietts - Diving into the Wreck: a postmortem look at real-world performance

More Decks by PyCon 2016

Other Decks in Programming

Featured

Transcript