Slide 1

Slide 1 text

Beyond top(1) Command-Line Monitoring on the JVM Colin Jones @trptcolin 8th Light

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

What to expect

Slide 4

Slide 4 text

command-line tooling

Slide 5

Slide 5 text

on the JVM

Slide 6

Slide 6 text

introspection & serviceability

Slide 7

Slide 7 text

--all-flags=false

Slide 8

Slide 8 text

war stories

Slide 9

Slide 9 text

real-life usage (well, re-enacted anyway)

Slide 10

Slide 10 text

A long time ago in a software shop far, far away…

Slide 11

Slide 11 text

Things are going pretty well

Slide 12

Slide 12 text

What does this thing look like? app-architecture Postgres Web / API Application Server Load Balancer Periodic Job Application Server 3rd-party Service A 3rd-party Service B Monitored email account End users: native mobile app Admin users: desktop browsers

Slide 13

Slide 13 text

But strange things are afoot

Slide 14

Slide 14 text

the server sometimes gets really slow

Slide 15

Slide 15 text

the team has to manually restart the application server

Slide 16

Slide 16 text

incident response time is ~5 minutes

Slide 17

Slide 17 text

Yes, strange things are afoot

Slide 18

Slide 18 text

Pain, frustration, anger

Slide 19

Slide 19 text

Just the facts

Slide 20

Slide 20 text

sometimes, things get slow

Slide 21

Slide 21 text

all requests seem to be affected

Slide 22

Slide 22 text

the JVM stays up

Slide 23

Slide 23 text

restart the JVM and everything is fine

Slide 24

Slide 24 text

What could it be?

Slide 25

Slide 25 text

Demo

Slide 26

Slide 26 text

More facts, please!

Slide 27

Slide 27 text

constant full GCs

Slide 28

Slide 28 text

what’s in the heap

Slide 29

Slide 29 text

what application code was running

Slide 30

Slide 30 text

The right tools for the job

Slide 31

Slide 31 text

vmstat system-level: CPU, memory, disk, context switching

Slide 32

Slide 32 text

top per-process: CPU & memory

Slide 33

Slide 33 text

jps what’s our PID?

Slide 34

Slide 34 text

jstack status of all threads (right now-ish!)

Slide 35

Slide 35 text

jcmd what can’t it do?! jcmd [PID] help (sorry, JVM 6 users: see jinfo/jmap/jstack)

Slide 36

Slide 36 text

jstat GC classloader compiler

Slide 37

Slide 37 text

Mystery solved!

Slide 38

Slide 38 text

Now “just” fix it

Slide 39

Slide 39 text

idea 1: eliminate the leak

Slide 40

Slide 40 text

idea 2: eliminate the cache altogether?

Slide 41

Slide 41 text

idea 3: delete the feature

Slide 42

Slide 42 text

idea 4: full-text search engine

Slide 43

Slide 43 text

So we’re good now… until the next incident

Slide 44

Slide 44 text

Lessons

Slide 45

Slide 45 text

“it’s slow” could mean lots of things

Slide 46

Slide 46 text

“high CPU” could mean lots of things

Slide 47

Slide 47 text

collecting data is crucial in a crisis

Slide 48

Slide 48 text

reproducing the issue helps me sleep at night

Slide 49

Slide 49 text

Other “right tools for the job”

Slide 50

Slide 50 text

Heap analyzers

Slide 51

Slide 51 text

Profilers

Slide 52

Slide 52 text

Constant monitoring & alerting

Slide 53

Slide 53 text

Dynamic tracing

Slide 54

Slide 54 text

Learning more

Slide 55

Slide 55 text

Books

Slide 56

Slide 56 text

operators are standing by! man jstat man jstack jcmd [PID] help [COMMAND] etc.

Slide 57

Slide 57 text

Thank you! Colin Jones @trptcolin 8th Light