Slide 1

Slide 1 text

Implementing the MetaVCG approach in the C-light system Dmitry Kondratyev Alexei Promsky A.P. Ershov Institute of Informatics Systems

Slide 2

Slide 2 text

Team and Aims A.P. Ershov Institute of Informatics Systems, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia. Theoretical Programming Lab.: Prof. Valery Nepomnyaschy, Prof. Nikolay Shilov, Igor Anureev, Alexey Promsky, Ilya Maryasov, Dmitry Kondratyev, ... Studies of theoretical foundations of informatics which can be applied in practical tasks such as modeling of sequential and parallel processes; semantics and specication; program verication The C program verication is one of our high-priority goals.

Slide 3

Slide 3 text

Why the C language? Still popular (according to the latest TIOBE index) Basis for the kindred languages: (1), (3) and (4). Can we work with them if we are unable to verify the C programs? Esp. when not all problems of the C program verication are solved. The interest in the C program verication is conrmed by researchers: VERISOFT, WHY (Frama-C), VCC

Slide 4

Slide 4 text

C-light and C-kernel Correct approaches/algorithms at every step. What do the founders say? Restrictions that contribute to provability are what make a programming language good. C.A.R. Hoare. The C-light language covers a major part of the C99 (C0 completely, Misra C almost); sets the evaluation order; avoids some low-level features. The C-kernel language possesses light and sound axiomatic semantics. Easy addition of new/remaining constructions? This is exactly what our research serves for!

Slide 5

Slide 5 text

The C-light Verication System: overview /*@ requires \nothing; assigns e; ensures \result == \old(e) && e == \old(e) + 1; */ e++ /*@ requires \nothing; assigns e; ensures \result == \old(e) && e == \old(e) + 1; */ (q = &e, y = *q, *q = *q + 1, y) MD1 = upd(MD0 , MeM(q), MeM(e)) ⋀ MD2 = upd(MD1 , MeM(y), MD1 (MeM(q))) ⋀ MD = upd(MD2 , MD2 (MeM(q), BinOpSem(+, MD2 (MeM(q)), 1)) ⋀ Val = MD(MeM(y)) ⇒ Val = \old(MD(MeM(e)) ⋀ MD(MeM(e)) = \old(MD(MeM(e))) + 1 Annotated C-light program passed to translator, translates Annotated C-kernel program passed to VCG, generates Verification condition passed to Simplify or Z3, validates

Slide 6

Slide 6 text

The C-light Verication System: some details

Slide 7

Slide 7 text

The C-light Verication System: things done so far Verication of some challenges from the well-known collections: aliasing, abrupt termination, side-eects, function pointers. Specications and verication of a subset of the Standard C Library /*@ requires \valid_range(s1, 0, strlen(s2)) && valid_string(s2); assigns s1[0..strlen(s2)]; ensures strcmp(s1, s2) == 0 && \result == s1; ensures \base_addr(\result) == \base_addr(s1); */ char *strcpy(char *restrict s1, const char *restrict s2) Experiments on self-applicability. Our translator from C-light into C-kernel is implemented using Clang (C++ API), however, a good part of its functionality is expressible in C-light.

Slide 8

Slide 8 text

The C-light Verication System: things done so far /*@ requires 1 <= id <= UINT_MAX; assigns id; behavior somewhere_in_the_middle: assumes 1 <= \old(id) < UINT_MAX; ensures id == \old(id) + 1 && strcmp(\result, strcat("BLOCK\0", ltoa(\old(id)))) == 0; behavior too_many_blocks: assumes \old(id) == UINT_MAX; ensures \result == NULL; complete behaviors somewhere_in_the_middle, too_many_blocks; disjoint behaviors somewhere_in_the_middle, too_many_blocks; */ char* getBlockID()

Slide 9

Slide 9 text

Current task: addition of new axiomatic rules In practice, the axiomatic semantics of a language takes form of Verication Condition Generator (VCG), thus reducing the question of program correctness to the truth of lemmas (verication conditions VCs) in some applied theory. We would like to add easily and correctly new axioms and rules to our Hoare systems and, consequently, to VCG. The reasons: The complete C language or transition to C++. No doubts here. The context specic rules or even specialized VCGs. Not so obvious. Some examples are required.

Slide 10

Slide 10 text

Why new axiomatic rules: example 1 During the Library verication the following pattern was found swap(x, y, buf ) ≡ memcpy(buf, x, m); memcpy(x, y, m); memcpy(y, buf, m); The general rule for the function call looks like {P } f (x) {Q } P ⇒ (P α ∧ Q γ ⇒ Qγβ) {P} f (e); {Q} , The substitutions α, β model the argument passing, whereas γ renames the variables and, thus, is equivalent to quantication. In the meantime, we can enrich the Hoare system by the following axiom: {x = x 0 ∧ y = y 0 } swap(x, y, buf ) {x = y 0 ∧ y = x 0 }

Slide 11

Slide 11 text

Why new axiomatic rules: example 2 Given, M is a two-dimensional matrix and e(k, i) is an expression depending on matrix indices k and i, consider the following triple: {Q(M ← rep(M, mat(e 1 , e 2 , e 3 , e 4 ), e(s, t)))} for(k = e 1 ; k <= e 2 ; k++) for(i = e 3 ; i <= e 4 ; i++) M[k][i] = e(k, i); {Q} where matrix rep(M, mat(e 1 , e 2 , e 3 , e 4 ), e(s, t))) results from replacement of all elements corresponding to sub-matrix mat(e 1 , e 2 , e 3 , e 4 ) by expression e. All these logical functions (rep, mat, etc) are dened by a set of axioms. For example rep(rep(M, S 1 , e(s, t)), S 2 , e(s, t)) = rep(M, S 1 ∪ S 2 , e(s, t))

Slide 12

Slide 12 text

Why new axiomatic rules: example 3 From those methods of loop invariant elimination we can step to 1. program schemata. For example, Dijkstra's linear search scheme {P} d = d 0 ; while(prop(d)) d = f (d) {Q} where d, d 0 , prop, f are uninterpreted objects and Q : ¬prop(dk) ∧ ∀i(0 ≤ i ≤ k ⇒ prop(di )) ∧ d = dk and di = f (di−1 ) 2. and even further to program transformations {P} A {P} {P} B {Q} £ {P} A; B {Q} inv ≡ P {P} if(e) A {P} £ {P} {inv}while(e) A {P}

Slide 13

Slide 13 text

MetaVCG The examples above are not articial. The corresponding studies are being conducted in our Lab: Ilya Maryasov develops the Mixed-semantics approach; Prof. Valery Nepomnyaschy develops the approach of Finite iterations over data structures; Prof. Nikolay Shilov tries to apply the program schemata to verication of dynamic programming algorithms. Two possible ways: One huge VCG replenished by rules every time we apply to a new domain hardly a good idea. A collection of specialized VCGs much better. Finally, the error-prone process of manual implementation of VCG can compromise the verication. A possible solution the MetaVCG approach.

Slide 14

Slide 14 text

MetaVCG: origins Can the correctness of a VCG be guaranteed not only by testing/verication but also by its construction? Basing on classical results by E.W. Dijkstra, R.L. London, D.C. Luckham etc., M. Moriconi and R. Schwartz proposed in 1981 a method for mechanically constructing VCGs from a useful class of Hoare logics. Any VCG constructed by the method is shown to be sound and deduction-complete w.r.t the associated Hoare logic.

Slide 15

Slide 15 text

MetaVCG: scheme Annotated C program Analysis and transformation Program in the internal form MetaVCG Recursively defined VCG Hoare system Verification conditions Axioms Proof environment

Slide 16

Slide 16 text

MetaVCG: preliminaries Metavariables P, Q, R denoting partially interpreted rst-order formulas (P, P ∧ x = 5, or x = 5). For a Hoare triple of the form {P(P 1 , ..., Pm)} S {Q(Q 1 , ..., Qn)} where predicate symbols Pi and Qj are logically free in P and Q, respectively, we have Pi ⇐ Qj , for i ∈ {1, ..., m} and j ∈ {1, ..., n} Given H + ⇐ T, H will be called the head of the dependency chain and T the tail. For a rule of the form {P 1 } S 1 {Q 1 }, ..., {Pn} Sn {Qn}, Γ {P} S {Q} we have S Si , for i ∈ {l, ..., n} ( +).

Slide 17

Slide 17 text

MetaVCG: preliminaries Function FreePreds denotes the set of logically free predicate symbols in a formula, a Hoare triple or a rule. Function FragVars denotes the set of "fragment variables" in the language fragment S of a Hoare triple {P} S {Q}. FragVars(if B then S 1 else S 2 fi) = {B, S 1 , S 2 } We use FreePreds and FragVars to distinguish those logically free variables that are bound in the program fragment when a rule is applied from those that must be bound by wp-calculus.

Slide 18

Slide 18 text

MetaVCG: normal form rule A normal form rule is any instance N of {P 1 } S 1 {Q 1 } , ..., {Pn} Sn {Qn}, Γ {P} S {Q} that satises the following constraints: 1. P 1 ,..., Pn and Q are predicate symbols free in N. 2. Γ is a sentence in the underlying theory such that FreePreds(Γ) ⊆ FreePreds(N) ∪ FragVars(S). 3. The fragment variables of each Si must be bound in S. So ∪ 1≤i≤nFragVars(Si ) ⊆ FragVars(S).

Slide 19

Slide 19 text

MetaVCG: normal form rule 4. Dependency ordering. The Hoare-triple premises of N must satisfy two dependency constraints. a. Pi + ⇐ Pj ⊃ i < j b. T + ⇐ U ∧ ¬(∃R)U + ⇐ R ⊃ U ≡ Q ∨ U bound in N 5. Monotonicity. Let P[P ← false, P ∈ s] denote P with the proper substitution of false for each predicate P in the set s. Then P[P 1 , ..., Pn, Q ← true] ∨ ∀s ⊆ {P 1 , ..., Pn, Q} ¬P[P ← false, P ∈ s] This constraint must hold for Γ and for each Qi .

Slide 20

Slide 20 text

MetaVCG: transforming proof rules into VCG Given any rule of the form {P 1 } S 1 {Q 1 }, ..., {Pn} Sn {Qn}, Γ {P} S {Q} wdp can be dened as follows: wdp(S, Q) = P[P 1 ← wdp(S 1 , Q 1 ), ..., Pn ← wdp(Sn, Qn)]∧ (∀v)Γ[P 1 ← wdp(S 1 , Q 1 ), ..., Pn ← wdp(Sn, Qn)] where [P 1 ← t 1 , ..., Pn ← tn] denotes n proper substitutions carried out sequentially in a left-to-right order, and v is the set of all free logical variables in Γ.

Slide 21

Slide 21 text

MetaVCG: transforming proof rules into VCG Taking as examples the classical axiom for assignment (without side eects) {P[x ← e]} x:=e {P} and the rule of inference for statement composition {P 1 } S 1 {R} {R} S 2 {Q} {P} S 1 ;S 2 {Q} we obtain the following predicate transformers: wdp(x:=e, P) = P[x ← e] wdp(S 1 ;S 2 , Q) = P[P ← wdp(S 1 , R), R ← wdp(S 2 , Q)] = wdp(S 1 , wdp(S 2 , Q))

Slide 22

Slide 22 text

MetaVCG: general form rule The proof rules look rather unusual: {P 1 } S 1 {Q}, {P 2 } S 2 {Q} {B ⊃ P 1 ∧ ¬B ⊃ P 2 } if B then S 1 else S 2 fi {Q} {P 1 } S {P}, P ∧ ¬B ⊃ Q, P ∧ B ⊃ P 1 {P} while B inv P do S od {Q} Moreover, the order on the premises is required, thus, narrowing the class of applicable Hoare systems. By the way, the axiomatic semantics of the C-kernel language does not satisfy these requirements. Perhaps, we could weaken the constraints somehow?

Slide 23

Slide 23 text

MetaVCG: general form rule A general form rule is any instance G of I 1 , ..., In, Γ {P} S {Q} that satises normal form constraints 1-3 and 4b, where: 1. Each premise I is in one of the following forms a. {R} S {Q} b. {F} S {Q} c. {R ∧ F} S {Q} where, in all three cases, R is a metavariable evaluating to a single predicate symbol free in G, F is a metavariable evaluating to a formula not containing any predicate symbols free in G, and Q is a metavariable.

Slide 24

Slide 24 text

MetaVCG: general form rule 2. The relation + ⇐ is irreexive with respect to I 1 , ..., In. 3. Let r be the set of predicate symbols free in the preconditions of I 1 , ..., In. Then, the following constraint on P must hold: P[P ← true, P ∈ r ∪ {Q}]∧ ∀s ⊆ r ∪ {Q}¬P[P ← false, P ∈ s] This constraint must hold for Γ and for each Qi . {P ∧ B} S {P}, P ∧ ¬B ⊃ Q {P} while B inv P do S od {Q} {P ∧ B} S1 {Q}, {P ∧ ¬B} S2 {Q} {P} if B then S1 else S2 fi {Q}

Slide 25

Slide 25 text

MetaVCG: transformation to normal form First, sort the rule according to the three classes of allowable premises, yielding a schema of the form {F 1 } S 1 {Q 1 }, ..., {Fj } Sj {Qj }, {Bj+1 } Sj+1 {Qj+1 }, ..., {Bk} Sk {Qk}, {Bk+1 ∧ Fk+1 } Sk+1 {Qk+1 }, ..., {Bn ∧ Fn} Sn {Qn}, Γ {P} S {Q} We now dene two functions: 1. Duplicates(i) = {m : |Bm| = |Bi |, j + 1 ≤ m ≤ n}, for j + 1 ≤ i ≤ n where, for a metavariable B, |B| denotes the partially interpreted rst-order formula bound to B, and 2. MkFormula(i) = Pi , for j + 1 ≤ i ≤ k |F| ⊃ Pi , for k + 1 ≤ i ≤ n

Slide 26

Slide 26 text

MetaVCG: transformation to normal form Now rewrite the sorted schema above as {P 1 } S 1 {Q 1 }, ..., {Pn} Sn {Qn}, Γ ∧ (|F 1 | ⊃ P 1 ) ∧ ... ∧ (|Fj | ⊃ Pj ) {P} S {Q} with the subsequent overall proper substitution [|Bi | ← k∈Duplicates(i) MkFormula(k)], for j + 1 ≤ i ≤ n The last step is to reorder the premises of this rule to satisfy normal form constraint 4a.

Slide 27

Slide 27 text

MetaVCG: correctness It may be demonstrated that a VCG constructed by this method is sound and deduction-complete with respect to a general form axiomatic denition G: Theorem: Let G be any general form axiom system G augmented by the rule of consequence and the axiom {false} S {Q} , and let τ denote the transformation from G to the normal form, and suppose that T is a complete (perhaps noneective) proof system for the underlying theory. Then G {P} S {Q} i T P ⊃ wdpτ(G) (S, Q) . Note: It has nothing to do with soundness and completeness of the general form axiom system w.r.t. the operational denition of the language.

Slide 28

Slide 28 text

MetaVCG: practice Hoare logics for C-kernel satises the general form constraints (thanks to the two-level approach). The prototype MetaVCG is implemented in C-light and displays the following features: ineective in some sense MetaVCG(H, AP) = VCG H (AP), but more appropriate for verication; bidirectional; partially veried.

Slide 29

Slide 29 text

MetaVCG: the pattern language It would be great to provide axioms and proof rules in classical graphical notation: {Q(MD) ← upd(MD, loc(val(e, MeM..STD)), cast(e ))} e = e ; {Q} {P} S {I} (I ∧ cast(val(e, MeM..STD), type(e, MeM, Γ), int) = 0) ⇒ Q (I ∧ cast(val(e, MeM..STD), type(e, MeM, Γ), int) = 0) ⇒ P while(e) S but signicant eorts will be required. At the moment, a simple textual representation has been developed. The idea is that rules are patterns that must be matched against annotated programs. The syntax of C-light is accompanied by rst-order logic, whereas some syntactic sugar denotes regexps.

Slide 30

Slide 30 text

MetaVCG: the pattern language {(any_predicate(Q)) (MD <- upd(MD, loc(val(e, MeM..STD)), cast(val(e', MeM..STD), type(e', MeM, TP), type(e, MeM, TP)))) } e = simple_expression(e'); {any_predicate(Q)} {P} S {I}, (I /\ cast(val(e, MeM..STD), type(e, MeM, TP), int) = 0) => Q, (I /\ cast(val(e, MeM..STD), type(e, MeM, TP), int) != 0) => P |- {any_predicate(INV)} while(simple_expression(e)) any_code(S) {any_predicate(Q)}

Slide 31

Slide 31 text

MetaVCG: implementation MetaVCG(N, tree) { 1: transform the Clang AST tree into struct program_node; 2: transform N into collection of struct pattern_node; 3: if (backward_strategy) goto 4 else goto 7; 4: // wp-calculus 5: take program_node, nd an appropriate pattern_node and apply the corresponding wdp; 6: exit; 7: // sp-calculus ... }

Slide 32

Slide 32 text

MetaVCG: implementation struct pattern_node { int is_omitted; int has_category; char* category; int has_identifier; char identifier[64]; int has_type; char* type; int has_value; char* value; int is_matched; int table_length; char match_identifiers[2][1000][64]; int children_count; struct pattern_node* children[1000]; };

Slide 33

Slide 33 text

MetaVCG: verication /*@ requires \valid(pattern) && \valid(code); assigns pattern->table_length; assigns pattern->match_identifiers[0..1] [\old(pattern->table_length)] [0..\max(strlen(pattern_identifier), 63)]; ensures strncmp(pattern->match_identifiers[0][pattern->table_length], pattern->identifier, 63); ensures strncmp(pattern->match_identifiers[1][pattern->table_length], pattern->identifier, 63); ensures pattern->table_length = \old(pattern->table_length) + 1; */ void add_identifier(struct pattern_node* pattern, struct program_node* code) { strncpy(pattern->match_identifiers[0][pattern->table_length], pattern->identifier, 63); strncpy(pattern->match_identifiers[1][pattern->table_length], code->identifier, 63); pattern->table_length++; }

Slide 34

Slide 34 text

Conclusion Results the axiomatic semantics of C-kernel can be automatically transformed into recursively dened VCG; MetaVCG was implemented using mixture of C and C++; apart from theoretical correctness we were able to partially verify our prototype tool. Plans the correctness theorem should be checked for the strongest postcondition approach; reducing the C++ part of implementation.

Slide 35

Slide 35 text

Conclusion Questions?