Linux Plumbers Conference, Nov 1-4, 2016, Santa Fe, NM
Abstract: Loosely based on Dinesh Subhraveti’s PhD thesis, the vPlay system enables the minimal runtime state of the container to be captured such that when restored, application would retrace its execution for a specified time interval. The key observation is that during the last moments of a crash, where the root cause typically lies, the application only accesses a small subset of its address space and only those pages need to be saved. The technique, dubbed partial checkpointing, is combined with logging to be used for debugging. Because all interactions of the application with the kernel are logged, the execution can be natively replayed even on BSD or Windows.
The kernel and user space implementations of the mechanism along with integration with GDB was done as part of Dinesh’s thesis. The details can be found at https://systems.cs.columbia.edu/archive/pub/2012/01/record-and-vplay-problem-determination-with-virtual-replay-across-heterogeneous-systems.