Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Record and vPlay: Debugging Container App Crashes with "Partial Checkpoints"

Record and vPlay: Debugging Container App Crashes with "Partial Checkpoints"

Linux Plumbers Conference, Nov 1-4, 2016, Santa Fe, NM

Abstract: Loosely based on Dinesh Subhraveti’s PhD thesis, the vPlay system enables the minimal runtime state of the container to be captured such that when restored, application would retrace its execution for a specified time interval. The key observation is that during the last moments of a crash, where the root cause typically lies, the application only accesses a small subset of its address space and only those pages need to be saved. The technique, dubbed partial checkpointing, is combined with logging to be used for debugging. Because all interactions of the application with the kernel are logged, the execution can be natively replayed even on BSD or Windows.

The kernel and user space implementations of the mechanism along with integration with GDB was done as part of Dinesh’s thesis. The details can be found at https://systems.cs.columbia.edu/archive/pub/2012/01/record-and-vplay-problem-determination-with-virtual-replay-across-heterogeneous-systems.

Dinesh Subhraveti

November 02, 2016
Tweet

More Decks by Dinesh Subhraveti

Other Decks in Technology

Transcript

  1. 
 
 Record and vPlay
 Debugging Container Crashes with “Partial

    Checkpoints” Dinesh Subhraveti CTO / Cofounder, appOrbit
  2. Record and vPlay 3 Problems with Problem Determination Report Reproduce

    Fix 1) Report-Problem ✤ Users don’t know what is relevant ✤ Overwhelm or mislead developer ✤ Persistence data:
 Privacy concerns Maintain
  3. Record and vPlay 5 Report Reproduce Fix Time spent on

    exchanges and setting up environment rather than fixing the problem Problems with Problem Determination 2) Reproduce-Problem ✤ Replicating environment:
 tedious and error-prone ✤ Complex config ✤ Nondeterminism:
 Repeated testing 1) Report-Problem ✤ Users don’t know what is relevant ✤ Overwhelm or mislead developer ✤ Persistence data:
 Privacy concerns Maintain
  4. Record and vPlay 6 Report Reproduce Fix 2) Reproduce-Problem ✤

    Replicating environment:
 tedious and error-prone ✤ Complex config ✤ Nondeterminism:
 Repeated testing Caveats * Container images are quite heavy * Not cross-platform Problems with Problem Determination 1) Report-Problem ✤ Users don’t know what is relevant ✤ Overwhelm or mislead developer ✤ Persistence data:
 Privacy concerns Containers address the problem to some extent Maintain
  5. Record and vPlay 7 Time read() gettimeofday() syscall syscall syscall

    syscall syscall syscall Complete and Concise Recording Fault recvfrom() Application Inputs include ✤ Data read from files, network sockets etc. ✤ Data returned by the OS via system calls Application consumes a variety of data as inputs while executing
  6. Record and vPlay 8 read stack page memory read read()

    Execute strcpy() function in libc at page 0xb75f3000 gettimeofday() memory read syscall memory read syscall syscall syscall syscall syscall Fault recvfrom() Application Inputs include ✤ Data read from files, network sockets etc. ✤ Data returned by the OS via system calls ✤ Bits of application and library code accessed by the application ✤ Memory pages read by the application Complete and Concise Recording Application consumes a variety of data as inputs while executing Time
  7. Record and vPlay Time 9 Partial Checkpointing read stack page

    recvfrom() memory read read() Execute strcpy function in libc at page 0xb75f3000 gettimeofday() memory read syscall memory read syscall syscall syscall syscall syscall Fault Application Error Propagation Distance
  8. Record and vPlay Time 10 Partial Checkpointing ▪ Checkpoint: Complete

    intermediate state of a running application at one point of its execution
 ▪ Partial checkpoint: Partial state of itself that an application accesses in a specified interval read stack page recvfrom() memory read read() Execute strcpy function in libc at page 0xb75f3000 gettimeofday() memory read syscall memory read syscall syscall syscall syscall syscall Fault Recording interval Application Error Propagation Distance
  9. Record and vPlay 11 ▪ Processor context: At the beginning

    of the recording interval ▪ System calls: Results of system calls made by the application ▪ Virtual Memory: Memory pages accessed by the application ▪ Nondeterministic events: Meta data necessary for deterministic replay: interleaved shared memory accesses, signals 
 No Kernel state State Composition
  10. Record and vPlay 12 Tracking Memory Pages ▪Complications due to

    threads and changes in memory geometry – Processes and threads are created and deleted – Memory regions are added and removed: mappings change over time – Shared memory regions persist even without being attached to a process
  11. Record and vPlay 13 Virtual Replay ▪Consists of two steps:

    Load phase and Replay phase
 ▪Load Phase: performed by a purpose-built binary loader 1) Sets up the initial sparsely-populated application address space 2) Recursively creates application threads 3) Transfers control to the application code as per register context
  12. Record and vPlay 14 Replay Phase ▪Execute the instructions produced

    by the application
 ▪Most instructions are executed natively – No privileged instructions
 ▪Two types of instructions need emulation – Instructions referencing user-defined segment registers
 (fs, gs — modify_ldt()) – Instructions that invoke a system call
 (int 0x80, sysenter)
  13. Record and vPlay 15 Partial Checkpoint in a Debugger No

    different than debugging with a standard binary
  14. Record and vPlay 16 ▪ Record-Replay on Debian: IBM HS-22

    blade center ▪ Replay on Gentoo: VMware Fusion on MacBook Pro, different Linux kernel ▪ Replay on Windows: Lenovo T61p Evaluation
  15. Record and vPlay 17 ✤Overhead below 3% for all except

    Squid (9%) and MySQL (17%) ✤Replay is faster than recording Runtime Performance Normalized Performance 0 0.3 0.6 0.9 1.2 m ysql apache-t apache-p squid bc gzip ncom p m player record-debian replay-debian replay-gentoo replay-windows
  16. Record and vPlay 18 Largest partial checkpoint was ~5 MB

    -- fraction of application memory footprint Partial Checkpoint Size MB 0 1.5 3 4.5 6 mysql apache-t apache-p squid bc gzip ncomp mplayer 5 s 10 s 15 s
  17. Record and vPlay 19 ✤ Grows with recording interval but

    not with application run time ✤ Largest log size for a 5 s interval was 59 MB Log Size 1 10 100 1000 m ysql apache-t apache-p squid bc gzip ncom p m player 5 s 10 s 15 s