Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Constructing the Formal Grammar of System Calls, Nikolay Efanov, MIPT, CEE-SECR 2017

CEE-SECR
October 21, 2017

Constructing the Formal Grammar of System Calls, Nikolay Efanov, MIPT, CEE-SECR 2017

The mathematical model of userspace-based process tree reconstruction via syscall sequences is constructed on the basis of the type-0 formal grammar and prototyped as two-staged grammar analyser with 3 heuristics for grammar shortening. The prototype has been developed to compare with profile-based techniques of syscall collection. The results of experimental comparison with two profile-based tools indicate the possibility of grammatical analysis competitive application for metadata reconstruction in checkpoint-restore tools.

The report is aimed at specialists working at the intersection of discrete mathematics, operating systems, virtualization technologies. It will be interesting both for developers and architects who use checkpoint-restore for processes, VMs and containers in their projects, and applied mathematicians working with mathematical models in computer science.

The audience will learn how to restore the process-tree state relatively quickly, without direct tree generation, profiling and a lot of heuristics and find out design and computing shortcomings of existing solutions; Learn a lot of problem-specific info: how to compactly store the process-tree in a string and how the stack-frame helps to parse such strings efficiently to restore the chains of syscalls, what cases of tree configuration are the most trivial and laborious, where is the limit when ptrace-based solutions are not looks such slow.

Knowing the approaches to recovering and analyzing the syscall sequences is an important step towards effectively addressing the various tasks of virtualization, checkpoint-restore, live migration and software vulnerabilities detection.

CEE-SECR

October 21, 2017
Tweet

More Decks by CEE-SECR

Other Decks in Technology

Transcript

  1. October 2017, St. Petersburg So4ware Engineering Conference Russia October 2017,

    St. Petersburg So4ware Engineering Conference Russia Construc=ng the formal grammar of system calls Nikolay Efanov, MIPT
  2. Motivation 1.  Immediate process tree analysis in Checkpoint-Restore. 2.  Overhead/extra

    metadata minimization during Live-Migration. 3.  Ideally: process tree + useful context (memory, files etc) only. Target Instance Target Instance state dump (Host 1) resuming (Host 2) 1 2 Saving / Transmitting the saved state
  3. The Statement and Restrictions: 1.  Reconstruct the syscall sequences which

    lead to some snapshotted process tree. 2.  Restrictions: •  Linux syscalls only •  Input is correct. Otherwise: ”Parsing error”. •  Syscalls: ü  Fork: creates a child process ü  Setsid: creates a session with current process as leader ü  Setpgid: sets a group as setpgid(0,pgid) ü  Exit: terminates a process -> reparent is initiated ü  à Basis of process tree transformations
  4. •  Different process trees using fork: Direct generation/search is not

    the best idea! •  Moreover … Combinatorial Estimations
  5. Suggestion #1: new tree notation The new process tree single-string

    notation: •  Single process: “p g s [...]”, where: Ø  Numbers: p – PID, s – SID, g – PGID Ø  ‘[‘,’]’ – terminal limiters of children-containing list . Ø  list for a root of any subtree contains all of descendants (thus, fork is notation-based) •  Example: String: “1 1 1 [ 2 1 1 […] ]” is equivalent to such tree à
  6. Suggestion #2: syscall grammar The grammar rules for fork, setsid,

    setpgid,exit*: •  Notation-based form: Ø  fork(* * * [*]) --> * * *[* \2 \3 [] \4]. Ø  setsid(* * * [*]) --> \1 \1 \1 [\4]. Ø  {p p * [*], setpgid(p, * * \1 [*]) | setpgid(p, p * * [*])} --> {p p \1 [\2], \3 p \1 [\5] | p p \2 [\3]}, where ‘{‘,’}’ : configuration exists or existed there. •  The grammar is context-sensitive (setpgid rule) •  Setpgid rule is separable into independent context-sensitive and context-free cases.
  7. Grammar Shortening: exit The grammar rule for exit: •  Notation-based

    form: Ø  {1 * * [*], exit(* * * [*])} --> 1 \1 \2 [\3 \7]. •  The grammar is type-0 (exit is shortening rule) •  3 heuristics (reverse-reparenting): 1)  Is session(child) unique? --> create ‘exited’ process with PID==session number, then --> reverse reperenting. 2)  Otherwise: no ‘exited’ processes in session(child)? --> attach the ‘exited’ process to session leader, then --> reparenting via setpgid, setsid. 3) Else: look up for suitable ‘exited’ process session(child) --> attach. No suitable processes? --> use 2).
  8. Grammar Analyzer •  Two-stage analyzer: O(Nlog(N)log(S)log(P)) Ø Stage 1: context-free analysis

    --> intermediate pre-analysed tree Ø Stage 2: context checking --> final syscall sequences restoring q +AVL-Based structures: p,g,s logging
  9. Experiment Setting •  Analyzer vs profile-based techniques: Ø ‘slow’: strace-based Ø ‘progressive’:

    perf-based •  Two profiles of tests: Ø ”Simple-load”: two simple cases without reparents Ø “Heuristic”: high-rated reparents
  10. Conclusions •  The solution is feasible for the simplest syscalls

    restoring •  Is better than ptrace on < 3000-4000 processes •  Complex rules should be handled accurately à drawback •  Ways to improve: Ø New syscalls/features (file descriptors, memory etc) Ø New methods desingning