Tracing & profiling services in production
Kaushik Srenevasan
[email protected]
@ksrenev
1
Monday, July 28, 14
Slide 2
Slide 2 text
Who am I?
• Current (at Twitter)
• VM and Diagnostics: Ruby (Kiji), Hotspot JVM, Scala
• Past (at Microsoft)
• Authored the 64 bit optimizing compiler in the Chakra JavaScript
runtime
• Common Language Runtime (CLR) performance
2
Monday, July 28, 14
Slide 3
Slide 3 text
Twitter.com
from ten thousand feet
• Service Oriented Architecture
• Platform
• CentOS Linux
• OpenJDK JVM
• Languages
• Java/Scala, C/C++, Ruby (Kiji) and Python
3
Monday, July 28, 14
Slide 4
Slide 4 text
Data store
4
Monday, July 28, 14
Slide 5
Slide 5 text
JVM @ Twitter
• Customized OpenJDK distribution
• Dedicated team to support and maintain releases
• Regular internal release cycle
• Ship JDK 7(u) (now) and 8 (future)
• Bundle useful tools / JVMTI agents
• Twitter University talk: Twitter scale computing with the OpenJDK
5
Monday, July 28, 14
Slide 6
Slide 6 text
JVM @ Twitter
• Why we exist?
• Low latency garbage collection on dedicated hardware and Mesos
• Scala-specific optimizations
• Tools
• Contrail
• The Twitter Diagnostics Runtime
6
Monday, July 28, 14
Slide 7
Slide 7 text
Observability vs Diagnostics
7
Monday, July 28, 14
Slide 8
Slide 8 text
Diagnostics
8
Monday, July 28, 14
Slide 9
Slide 9 text
Diagnostics in production
• Global
• Performant
• Dynamic
9
Monday, July 28, 14
Slide 10
Slide 10 text
State of the art
• Global, dynamic, arbitrary context kernel and user mode
instrumentation.
• An extremely low overhead, scalable mechanism for aggregating event
data.
• The ability to execute arbitrary user actions when events occur.
10
Monday, July 28, 14
Slide 11
Slide 11 text
Guiding principles
• Twitter owns the entire stack
• Integrate well with standard platform tools
• Do not reinvent the wheel!
11
Monday, July 28, 14
Slide 12
Slide 12 text
perf
• Linux profiler
• Ships in the kernel tree
• Abstraction over CPU’s performance counters
12
Monday, July 28, 14
Slide 13
Slide 13 text
Why perf?
• Simple
• No setup required
• Lightweight
• Powerful
13
Monday, July 28, 14
perf for Managed Code
• Traditional managed code (Java) profilers
• ThreadMXBean.getThreadInfo
• JVMTI: GetAllStackTraces
• Undocumented AsyncGetCallTrace
• Our approach: Make Java look like native code
17
Monday, July 28, 14
Slide 18
Slide 18 text
18
Monday, July 28, 14
Slide 19
Slide 19 text
Demo I
perf and tooling
19
Monday, July 28, 14
Slide 20
Slide 20 text
Tracing
• Scope
• System wide
• Process specific
• Application specific?
• Export richer, context specific data
• Unified event bus
20
Monday, July 28, 14
Slide 21
Slide 21 text
Tracing in Linux
• Function tracing
• Tracepoint support
• kprobes
• uprobes
• Covers NFS, RPC, Filesystem, Devices, Network, Power, Kernel,
Virtualization etc.
21
Monday, July 28, 14
Slide 22
Slide 22 text
UProbes
• Extension of the KProbes infrastructure to support user mode
tracepoints
• Support for predicates
• No support for arbitrary user actions (like DTrace)
• No support for managed code
22
Monday, July 28, 14
Slide 23
Slide 23 text
Tracing in native code
• Use SystemTap probe format
• Large number of pre-existing probes
• Source level compatibility with DTrace probes
• Add support in perf to understand SystemTap probe definitions
23
Monday, July 28, 14
Slide 24
Slide 24 text
Tracing in managed code
• VM level tracing
• Existing support for DTrace probes
• Very heavyweight (not sampled)
• Java level tracing
24
Monday, July 28, 14
Slide 25
Slide 25 text
Demo II
Tracing
25
Monday, July 28, 14
Slide 26
Slide 26 text
26
Monday, July 28, 14
Slide 27
Slide 27 text
Open sourcing ...
• Understand user interest
• Upstream vs Publish on Github
• Please get in touch
27
Monday, July 28, 14