TMPA-2017: Distributed Analysis of the BMC Kind: Making It Fit the Tornado Supercomputer

1 / 31 Distributed Analysis of the BMC Kind: Making
It Fit the Tornado Supercomputer Azat Abdullin, Daniil Stepanov, and Marat Akhin JetBrains Research March 3, 2017

2 / 31 Static analysis Static program analysis is the
analysis of computer software which is performed without actually executing programs

3 / 31 Performance problem Most static analyses have big
problems with performance Our bounded model checking tool Borealis is not an exception We decided to try scaling Borealis to multiple cores

4 / 31 Bounded model checking algorithm

5 / 31 Verication example Program: int x; int y=8,z=0,w=0;
if (x) z = y - 1; else w = y + 1; assert (z == 7 || w == 9) Constraints: y = 8, z = x ? y -1 : 0, w = x ? 0 : y + 1, z != 7, w != 9 UNSAT. Assert always true

6 / 31 Verication example Program: int x; int y=8,z=0,w=0;
if (x) z = y - 1; else w = y + 1; assert (z == 5 || w == 9) Constraints: y = 8, z = x ? y -1 : 0, w = x ? 0 : y + 1, z != 5, w != 9 SAT. Program contains a bug Counterexample: y = 8, x = 1, w = 0, z = 7

7 / 31 Borealis

8 / 31 Program representation

9 / 31 Problem A huge number of SMT queries
is involved in BMC We try to scale Borealis to multiple cores on our RSC Tornado

10 / 31 RSC Tornado supercomputer 712 dual-processor nodes with
1424 Intel Xeon E5-2697 64 GB of DDR4 RAM and local 120 GB SSD storage 1 PB Lustre storage InniBand FDR, 56 Gb/s

11 / 31 Lustre storage Parallel distributed le system Highly
scalable Terabytes per second of I/O throughput Inecient work with small les

12 / 31 Borealis compilation scheme

13 / 31 Distributed compilation There are several ways to
distribute compilation: Compilation on the Lustre storage Distribution of intermediate build tree to the processing nodes Distribution of copies of the analyzed project

14 / 31 Compilation on the Lustre storage Each node
accesses Lustre for necessary les Lustre is slow when dealing with multiple small les

15 / 31 Distribution of intermediate build tree Reduce the
CPU time Build may contain several related compilation/linking phases

16 / 31 Distribution of copies of the analyzed project
Compilation is done using standard build tools We are repeating computations on every node Doesn't increase the wall-clock time

17 / 31 Distributed linking We distribute dierent SMT queries
to dierent nodes/cores Borealis performs analysis on an LLVM IR module

18 / 31 Distributed linking Module level • Same as
parallel make • Not really ecient Instruction level • Need to track dependencies between SMT calls • Too complex Function level • Medium eciency • Simple implementation

19 / 31 Distributed linking There are two ways how
one can distribute functions between several processes: Dynamic distribution Static distribution

20 / 31 Dynamic distribution Master process distributes functions between
several processes Based on a single producer/multiple consumers scheme If a process receives N functions, it also has to run auxiliary LLVM passes N times

21 / 31 Static distribution Each process determines a set
of function based on it's rank We use the following two rank kinds: • global rank • local rank After some experiments we decided to use static method

22 / 31 Improving static distribution Need to balance workload
We reinforce method with function complexity estimation Our estimation is based on the following properties: • Function size • Number of memory work instructions

23 / 31 PDD Borealis records the analysis results Thereby
we don't re-analyze already processed functions Persistent Defect Data (PDD) is used for recording results PDD contains: • Defect location • Defect type • SMT result { "location": { "loc": { "col": 2, "line": 383 }, "filename": "rarpd.c" }, "type": "INI -03" }

24 / 31 PDD synchronization problem Transferring a full PDD
takes a long time We synchronize a reduced PDD (rPDD) rPDD is simply a list of already analyzed functions

25 / 31 rPDD synchronization To make the synchronization we
utilize a two-staged approach: Synchronize rPDD between the processes on a single node Synchronize rPDD between the nodes

26 / 31 Implementation Borealis HPC implementation is based on
OpenMPI We implemented API to work with the library HPC Borealis is implemented in the form of 3 LLVM passes

27 / 31 Evaluation We tested the prototype in the
following congurations: One process on a local1 machine Eight processes on a local machine On RSC Tornado using 1, 2, 4, 8, 16 and 32 nodes 1 a machine with Intel Core i7-4790 3.6 GHz processor, 32 GB of RAM and Intel 535 SSD storage

28 / 31 Evaluation projects Name SLOC Modules Description git
340k 49 distributed revision control system longs 209k 1 URL shortener beanstalkd 7.5k 1 simple, fast work queue zstd 42k 3 fast lossless compression algorithm library reptyr 3.5k 1 utility for reattaching programs to new terminals

29 / 31 Evaluation results zstd git longs beanstalkd reptyr
SCC 1 process 678:23 2:05 1:30 SCC 1 node 2433:05 113:59 58:53 2:50 1:53 SCC 2 nodes 2421:35 101:22 59:00 2:12 1:32 SCC 4 nodes 2419:23 96:53 61:09 2:19 1:19 SCC 8 nodes 2510:34 96:51 63:09 2:10 1:43 SCC 16 nodes 2434:05 97:26 63:06 2:37 1:34 SCC 32 nodes 2346:39 107:14 63:02 2:34 1:52 Local 1 process 2450:02 281:11 205:05 0:36 0:08 Local 8 processes 2848:55 103:21 93:14 0:30 0:06

30 / 31 Conclusion Our main takeaways are as follows:
Several big functions can bottleneck the analysis LLVM is not optimized for distributed scenarios Single-core optimizations can create diculties for HPC Adding nodes can increase the time of analysis

31 / 31 Contact information {abdullin, stepanov, akhin}@kspt.icc.spbstu.ru Borealis repository:
https://bitbucket.org/vorpal-research/borealis

TMPA-2017: Distributed Analysis of the BMC Kind...

TMPA-2017: Distributed Analysis of the BMC Kind: Making It Fit the Tornado Supercomputer

Exactpro
PRO

More Decks by Exactpro

Other Decks in Technology

Featured

Transcript

1 / 31 Distributed Analysis of the BMC Kind: Making

2 / 31 Static analysis Static program analysis is the

3 / 31 Performance problem Most static analyses have big

4 / 31 Bounded model checking algorithm

5 / 31 Verication example Program: int x; int y=8,z=0,w=0;

6 / 31 Verication example Program: int x; int y=8,z=0,w=0;

7 / 31 Borealis

8 / 31 Program representation

9 / 31 Problem A huge number of SMT queries

10 / 31 RSC Tornado supercomputer 712 dual-processor nodes with

11 / 31 Lustre storage Parallel distributed le system Highly

12 / 31 Borealis compilation scheme

13 / 31 Distributed compilation There are several ways to

14 / 31 Compilation on the Lustre storage Each node

15 / 31 Distribution of intermediate build tree Reduce the

16 / 31 Distribution of copies of the analyzed project

17 / 31 Distributed linking We distribute dierent SMT queries

18 / 31 Distributed linking Module level • Same as

19 / 31 Distributed linking There are two ways how

20 / 31 Dynamic distribution Master process distributes functions between

21 / 31 Static distribution Each process determines a set

22 / 31 Improving static distribution Need to balance workload

23 / 31 PDD Borealis records the analysis results Thereby

24 / 31 PDD synchronization problem Transferring a full PDD

25 / 31 rPDD synchronization To make the synchronization we

26 / 31 Implementation Borealis HPC implementation is based on

27 / 31 Evaluation We tested the prototype in the

28 / 31 Evaluation projects Name SLOC Modules Description git

29 / 31 Evaluation results zstd git longs beanstalkd reptyr

30 / 31 Conclusion Our main takeaways are as follows:

31 / 31 Contact information {abdullin, stepanov, akhin}@kspt.icc.spbstu.ru Borealis repository: