Slide 1

Slide 1 text

Tracing & profiling services in production Kaushik Srenevasan [email protected] @ksrenev 1 Monday, July 28, 14

Slide 2

Slide 2 text

Who am I? • Current (at Twitter) • VM and Diagnostics: Ruby (Kiji), Hotspot JVM, Scala • Past (at Microsoft) • Authored the 64 bit optimizing compiler in the Chakra JavaScript runtime • Common Language Runtime (CLR) performance 2 Monday, July 28, 14

Slide 3

Slide 3 text

Twitter.com from ten thousand feet • Service Oriented Architecture • Platform • CentOS Linux • OpenJDK JVM • Languages • Java/Scala, C/C++, Ruby (Kiji) and Python 3 Monday, July 28, 14

Slide 4

Slide 4 text

Data store 4 Monday, July 28, 14

Slide 5

Slide 5 text

JVM @ Twitter • Customized OpenJDK distribution • Dedicated team to support and maintain releases • Regular internal release cycle • Ship JDK 7(u) (now) and 8 (future) • Bundle useful tools / JVMTI agents • Twitter University talk: Twitter scale computing with the OpenJDK 5 Monday, July 28, 14

Slide 6

Slide 6 text

JVM @ Twitter • Why we exist? • Low latency garbage collection on dedicated hardware and Mesos • Scala-specific optimizations • Tools • Contrail • The Twitter Diagnostics Runtime 6 Monday, July 28, 14

Slide 7

Slide 7 text

Observability vs Diagnostics 7 Monday, July 28, 14

Slide 8

Slide 8 text

Diagnostics 8 Monday, July 28, 14

Slide 9

Slide 9 text

Diagnostics in production • Global • Performant • Dynamic 9 Monday, July 28, 14

Slide 10

Slide 10 text

State of the art • Global, dynamic, arbitrary context kernel and user mode instrumentation. • An extremely low overhead, scalable mechanism for aggregating event data. • The ability to execute arbitrary user actions when events occur. 10 Monday, July 28, 14

Slide 11

Slide 11 text

Guiding principles • Twitter owns the entire stack • Integrate well with standard platform tools • Do not reinvent the wheel! 11 Monday, July 28, 14

Slide 12

Slide 12 text

perf • Linux profiler • Ships in the kernel tree • Abstraction over CPU’s performance counters 12 Monday, July 28, 14

Slide 13

Slide 13 text

Why perf? • Simple • No setup required • Lightweight • Powerful 13 Monday, July 28, 14

Slide 14

Slide 14 text

Why perf? Benchmark (baseline) Sampling (perf) Sampling (perf, Yourkit) 14 Monday, July 28, 14

Slide 15

Slide 15 text

Why perf? Benchmark (baseline) Bytecode instrumentation (Heapster) Tracing Yourkit, JVM SystemTap Sampling (perf) Sampling (perf, Yourkit) 15 Monday, July 28, 14

Slide 16

Slide 16 text

Why perf? • Powerful • Mixed mode stacks. • CPU, Performance counters (cache, branch etc.), Scheduler latencies ... • Spawn, Attach and “top” modes. 16 Monday, July 28, 14

Slide 17

Slide 17 text

perf for Managed Code • Traditional managed code (Java) profilers • ThreadMXBean.getThreadInfo • JVMTI: GetAllStackTraces • Undocumented AsyncGetCallTrace • Our approach: Make Java look like native code 17 Monday, July 28, 14

Slide 18

Slide 18 text

18 Monday, July 28, 14

Slide 19

Slide 19 text

Demo I perf and tooling 19 Monday, July 28, 14

Slide 20

Slide 20 text

Tracing • Scope • System wide • Process specific • Application specific? • Export richer, context specific data • Unified event bus 20 Monday, July 28, 14

Slide 21

Slide 21 text

Tracing in Linux • Function tracing • Tracepoint support • kprobes • uprobes • Covers NFS, RPC, Filesystem, Devices, Network, Power, Kernel, Virtualization etc. 21 Monday, July 28, 14

Slide 22

Slide 22 text

UProbes • Extension of the KProbes infrastructure to support user mode tracepoints • Support for predicates • No support for arbitrary user actions (like DTrace) • No support for managed code 22 Monday, July 28, 14

Slide 23

Slide 23 text

Tracing in native code • Use SystemTap probe format • Large number of pre-existing probes • Source level compatibility with DTrace probes • Add support in perf to understand SystemTap probe definitions 23 Monday, July 28, 14

Slide 24

Slide 24 text

Tracing in managed code • VM level tracing • Existing support for DTrace probes • Very heavyweight (not sampled) • Java level tracing 24 Monday, July 28, 14

Slide 25

Slide 25 text

Demo II Tracing 25 Monday, July 28, 14

Slide 26

Slide 26 text

26 Monday, July 28, 14

Slide 27

Slide 27 text

Open sourcing ... • Understand user interest • Upstream vs Publish on Github • Please get in touch 27 Monday, July 28, 14

Slide 28

Slide 28 text

Questions? 28 Monday, July 28, 14