The mathematical model of userspace-based process tree reconstruction via syscall sequences is constructed on the basis of the type-0 formal grammar and prototyped as two-staged grammar analyser with 3 heuristics for grammar shortening. The prototype has been developed to compare with profile-based techniques of syscall collection. The results of experimental comparison with two profile-based tools indicate the possibility of grammatical analysis competitive application for metadata reconstruction in checkpoint-restore tools.
The report is aimed at specialists working at the intersection of discrete mathematics, operating systems, virtualization technologies. It will be interesting both for developers and architects who use checkpoint-restore for processes, VMs and containers in their projects, and applied mathematicians working with mathematical models in computer science.
The audience will learn how to restore the process-tree state relatively quickly, without direct tree generation, profiling and a lot of heuristics and find out design and computing shortcomings of existing solutions; Learn a lot of problem-specific info: how to compactly store the process-tree in a string and how the stack-frame helps to parse such strings efficiently to restore the chains of syscalls, what cases of tree configuration are the most trivial and laborious, where is the limit when ptrace-based solutions are not looks such slow.
Knowing the approaches to recovering and analyzing the syscall sequences is an important step towards effectively addressing the various tasks of virtualization, checkpoint-restore, live migration and software vulnerabilities detection.