Deductive verification of unmodified Linux kernel library functions

Slide 1

Slide 1 text

Deductive verification of unmodified Linux kernel library functions Denis Efremov NRU HSE [email protected] ISoLA 2018, Limassol, Cyprus, 6 November 2018

Slide 2

Slide 2 text

Motivation Tools evaluation (Frama-C+AstraVer+Why3) on a real code: • Does our specification language expressive enough to describe this code? • How far can we go without posing restrictions on the C syntax? Tests for our tools: • When we will change our memory/arithmetic models, will a fully-proved function still be easy to reprove?

Slide 3

Slide 3 text

Linux kernel Library functions Linux kernel • Doesn’t rely on any other piece of software (no stdlib, only gcc- builtins) • Userspace/Kernelspace pointers • Contains no floating point operations (well, almost) • Wide use of gcc extensions • Heavy-weight casting operations container_of/offsetof, unions, pointers-to-integers, void * to struct *, pointers to functions, bitwise operations • You can’t rewrite the code to be more “suitable” for verification in all cases Contains implementation of many «standard» functions on strings, memory from stdlib • Generic versions in C • Architecture-optimized versions in assembler

Slide 4

Slide 4 text

What can we say about this function? • This is a pure C function; • It computes the average value of two int values. int average(int a, int b) { return (a + b) / 2; } What is the deductive verification? How does it look like on practice?

Slide 5

Slide 5 text

What can we say about this function? • This is a pure C function; • It computes the average value of two int values; • There is a signed integer overflow under certain conditions. int average(int a, int b) { return (a + b) / 2; } What is the deductive verification? How does it look like on practice?

Slide 6

Slide 6 text

What is the deductive verification? Context of a function call • Calling context for average: binary search function; • The indexes l and h are non- negative, l is not greater than h; • Integer overflow in m may lead to out-of-bounds access (base[m]). int average(int a, int b) { return (a + b) / 2; } int *binsearch(int *base, int n, int key) { int l = 0, h = n - 1; while (l <= h) { int m = average(l, h); int val = base[m]; if (val < key) { l = m + 1; } else if (val > key) { h = m - 1; } else { return base + m; } } return NULL; }

Slide 7

Slide 7 text

Formal specification for a C function Contract of a function Describe call context (pre-conditions): : × → ⊤, ⊥ , ≡ ≥ 0 ∧ ≥ 0 ∧ ≤ Describe functional requirements on results (post-conditions): : ×× → {⊤, ⊥} , , ≡ = + 2

Slide 8

Slide 8 text

Formal specification for a C function Error model and code representation • Define an error (an integer overflow): _: → {⊤, ⊥} _ ≡ _ ≤ ≤ _ • Formalize the program code: the function , returns the result (, ) according to its program code, iff it terminates and terminates without an error otherwise, special value returned • Prove the total correctness: ∀, , ⇒ , ≠ && , , ,

Slide 9

Slide 9 text

Formal specification for a C function Code should comply with specification /*@ requires 0 <= a && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; }

Slide 10

Slide 10 text

Formal specification for a C function Code should comply with specification Verification Condition /*@ requires 0 <= a && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } predicate in_bounds (n:int) = min <= n /\ n <= max constant a : t17 constant b : t17 axiom H : of_int 0 <= a /\ of_int 0 <= b /\ a <= b axiom H1: in_bounds 2 constant o : t17 axiom H2 : to_int o = 2 goal WP_parameter_average: in_bounds (to_int a + to_int b)

Slide 11

Slide 11 text

Formal specification for a C function Code should comply with specification Verification Condition Pre-condition update /*@ requires 0 <= a && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } /*@ requires 0 <= a <= INT_MAX / 2; requires 0 <= b <= INT_MAX / 2; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } predicate in_bounds (n:int) = min <= n /\ n <= max constant a : t17 constant b : t17 axiom H : of_int 0 <= a /\ of_int 0 <= b /\ a <= b axiom H1: in_bounds 2 constant o : t17 axiom H2 : to_int o = 2 goal WP_parameter_average: in_bounds (to_int a + to_int b)

Slide 12

Slide 12 text

Formal specification for a C function Code should comply with specification Verification Condition Code fix Pre-condition update int average(int a, int b) { - return (a + b) / 2; + return a + (b - a) / 2; } /*@ requires 0 <= a && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } /*@ requires 0 <= a <= INT_MAX / 2; requires 0 <= b <= INT_MAX / 2; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } predicate in_bounds (n:int) = min <= n /\ n <= max constant a : t17 constant b : t17 axiom H : of_int 0 <= a /\ of_int 0 <= b /\ a <= b axiom H1: in_bounds 2 constant o : t17 axiom H2 : to_int o = 2 goal WP_parameter_average: in_bounds (to_int a + to_int b)

Slide 13

Slide 13 text

Related Work • M. Torlakcik «Contracts in OpenBSD» (2010) • Frama-C + Jessie deductive verification plugin • 12 functions (7 fully-proved functions) • Solvers: Simplify (1.5.4), Alt-Ergo (0.7.3), Z3 (2.0) • N. Carvalho, Silva Sousa, Cristiano and Pinto, Jorge Sousa and Tomb, Aaron «Formal Verification of kLIBC with the WP Frama-C Plug-in» (2014) • Frama-C + WP deductive verification plugin • 26 functions (14 fully-proved functions) • Solvers: Alt-Ergo (0.95.1), CVC3 (2.4.1), Z3 (4.3.1) • D. R. Cok, I. Blissard, J. Robbins «C Library annotations in ACSL for Frama-C: experience report» (2017)

Slide 14

Slide 14 text

Known problems (1) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- && *s != '\0'; ++s) if (*s == (char)c) return (char *)s; return NULL; }

Slide 15

Slide 15 text

Known problems (1) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- && *s != '\0'; ++s) if (*s == (char)c) return (char *)s; return NULL; } • The underflow of an unsigned loop iterator at the last iteration step due to the postfix decrement;

Slide 16

Slide 16 text

Known problems (1) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- /*@%*/ && *s != '\0'; ++s) if (*s == (char)c) return (char *)s; return NULL; } • The underflow of an unsigned loop iterator at the last iteration step due to the postfix decrement;

Slide 17

Slide 17 text

Known problems (2) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- /*@%*/ && *s != '\0'; ++s) if (*s == (char)c) return (char *)s; return NULL; } • The underflow of an unsigned loop iterator at the last iteration step due to the postfix decrement; • The intended cast to a smaller integer type;

Slide 18

Slide 18 text

Known problems (2) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- /*@%*/ && *s != '\0'; ++s) if (*s == (char) /*@%*/ c) return (char *)s; return NULL; } • The underflow of an unsigned loop iterator at the last iteration step due to the postfix decrement; • The intended cast to a smaller integer type;

Slide 19

Slide 19 text

Known problems (3) And how to handle them (ACSL extensions) char *strnchr(const char *s, size_t count, int c) { for (; count-- /*@%*/ && *s != '\0'; ++s) if (*s == (char) /*@%*/ c) return (char *)s; return NULL; } • The underflow of an unsigned loop iterator at the last iteration step due to the postfix decrement; • The intended cast to a smaller integer type; • Pointer casts. Example: unsigned char * to char *.

Slide 20

Slide 20 text

VerKer Results (1) • 26 library functions from Linux (Frama-C+AstraVer+Why3) • 17 str* functions • 6 mem* functions • 3 others • 25 fully-proved functions • in memmove we were not able to discharge one verification condition • In 9 functions there was an intended integer overflow • In 7 functions there was an intended integer cast to a smaller type • In 2 we «slightly» changed the code to prove them • Solvers: Alt-Ergo (2.0), CVC4 (1.4), CVC4 (1.6)

Slide 21

Slide 21 text

VerKer Results (2) • VC transformation strategy for solvers benchmarking • Total number of verification conditions is 2781 • Number of lemmas 69 lemmas (37 proved automatically) • Integration with CI system (TravisCI) • An average number of spec lines for a single C line (about 900) is ~2.6 • Open specifications and verification artifacts (proofs) http://forge.ispras.ru/projects/verker https://github.com/evdenis/verker

Slide 22

Slide 22 text

Memmove • The memory model implemented in AstraVer (Jessie) plugin allows arithmetic operations on pointers only when the pointers belong to the same allocated memory block; • For memmove, this is not necessarily the case; • If we state in the specification contract that src and dest may belong to different allocated memory blocks, then it is impossible to prove the VC states that they should belong to the same memory block; • Comparison of pointers to different memory blocks is the undefined behavior in ACSI C. void *memmove(void *dest, const void *src, size_t count) { if (dest <= src)

Slide 23

Slide 23 text

The modified functions An implicit cast in memset, strcmp void *memset(void *s, int c, size_t count) { char *xs = s; while (count--) *xs++ = c; return s; } int strcmp(const char *cs, const char *ct) { unsigned char c1, c2; while (1) { c1 = *cs++; c2 = *ct++; if (c1 != c2) return c1 < c2 ? -1 : 1; if (!c1) break; } return 0; }

Slide 24

Slide 24 text

The modified functions An implicit cast in memset, strcmp void *memset(void *s, int c, size_t count) { char *xs = s; while (count--) *xs++ = (char) c; return s; } int strcmp(const char *cs, const char *ct) { unsigned char c1, c2; while (1) { c1 = (unsigned char) *cs++; c2 = (unsigned char) *ct++; if (c1 != c2) return c1 < c2 ? -1 : 1; if (!c1) break; } return 0; }

Slide 25

Slide 25 text

The modified functions An implicit cast in memset, strcmp void *memset(void *s, int c, size_t count) { char *xs = s; while (count--) *xs++ = (char) /*@%*/ c; return s; } int strcmp(const char *cs, const char *ct) { unsigned char c1, c2; while (1) { c1 = (unsigned char) /*@%*/ *cs++; c2 = (unsigned char) /*@%*/ *ct++; if (c1 != c2) return c1 < c2 ? -1 : 1; if (!c1) break; } return 0; }

Slide 26

Slide 26 text

What’s next (1) • «Lemma Functions for Frama-C: C Programs as Proofs» G. Volkov, M. Mandrykin, D. Efremov https://arxiv.org/abs/1811.05879 • Auto-active verification technique for the Frama-C framework • Lemma-functions ACSL extension • Interactive proving (Coq) vs auto-active verification technique • 31 lemma-functions • Source code: https://github.com/evdenis/verker/tree/lemma_functi ons

Slide 27

Slide 27 text

What’s next (2) Arch-optimized implementations of functions Function Implementations on architectures memmove powerpc (2), s390, mips, x86_64, alpha, sparc (2) memcpy ia64 (2), powerpc (2), s390, mips (2), x86_64 (3), alpha, spark memset ia64, powerpc (2), s390, mips, x86_64, alpha (2), spark (2) memchr powerpc, alpha (2) memcmp powerpc (2), spark memscan spark (4) strcat alpha (2) strchr alpha (2) strncmp powerpc, spark (2) strcpy alpha strlen ia64, powerpc, alpha (2), spark strrchr alpha, arm64

Slide 28

Slide 28 text

How to verify all these implementations? • Runtime verification • Translate a contract for a generic function in assertions (Frama-C + E-ACSL) • Extract particular implementation • Integrate fuzzer (e.g., libfuzzer) with assertions • Run the testing in QEMU (user mode) • Catch the violations of postconditions

Slide 29

Slide 29 text

Translate specifications to assertions Frama-C E-ACSL /*@ requires a <= 0 && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } bool average_precondition(int a, int b) { if (0 <= a && 0 <= b) if (a <= b) return true; return false; } bool average_postcondition(int ret_value) { long result = ((long) a + (long) b) / 2; if (result == ret_value) return true; return false; } int _average(int a, int b) { assert(average_precondition(a, b)); int _tmp = average(a, b); assert(average_postcondition(_tmp)); return _tmp; }

Slide 30

Slide 30 text

Translate specifications to assertions Frama-C E-ACSL /*@ requires a <= 0 && 0 <= b; requires a <= b; ensures \result == (a + b) / 2; */ int average(int a, int b) { return (a + b) / 2; } bool average_precondition(int a, int b) { if (0 <= a && 0 <= b) if (a <= b) return true; return false; } bool average_postcondition(int ret_value) { long result = ((long) a + (long) b) / 2; if (result == ret_value) return true; return false; } void _fuzz_average(int fuzz_a, int fuzz_b) { if (average_precondition(fuzz_a, fuzz_b)) { int _tmp = average(fuzz_a, fuzz_b); assert(average_postcondition(_tmp)); } }

Slide 31

Slide 31 text

Questions?

Slide 32

Slide 32 text

Logic errors. Can you see the contradiction? The artificial example /*@ requires 0 == 1; ensures \result == 0 && \result == 1 && \result == 2; */ int main(void) { int a = 1; return a / 0; } • The contradiction in the specification; • Division-by-zero in the main function; • Errors in specification may lead to missing errors in code.

Slide 33

Slide 33 text

Logic errors. Can you see the contradiction? The real example logic Z Count{L}(int *a, Z m, Z n, int v); axiom CountSectionEmpty: ∀ int *a, v, Z m, n; n ≤ m ⇒ Count(a, m, n, v) == 0; axiom CountSectionHit: ∀ int *a, v, Z n, m; a[n] == v ⇒ Count(a, m, n + 1, v) == Count(a, m, n, v) + 1;

Slide 34

Slide 34 text

Slide 35

Slide 35 text

logic Z Count{L}(int *a, Z m, Z n, int v); axiom CountSectionEmpty: ∀ int *a, v, Z m, n; n ≤ m ⇒ Count(a, m, n, v) == 0; axiom CountSectionHit: ∀ int *a, v, Z n, m; a[n] == v ⇒ Count(a, m, n + 1, v) == Count(a, m, n, v) + 1; int a = 5; assert Count(&a+1, 0, -1, 5) == 0 && Count(&a+1, 0, 0, 5) == 0; Logic errors. Can you see the contradiction? The real example

Slide 36

Slide 36 text

logic Z Count{L}(int *a, Z m, Z n, int v); axiom CountSectionEmpty: ∀ int *a, v, Z m, n; n ≤ m ⇒ Count(a, m, n, v) == 0; axiom CountSectionHit: ∀ int *a, v, Z n, m; a[n] == v ⇒ Count(a, m, n + 1, v) == Count(a, m, n, v) + 1; int a = 5; assert Count(&a+1, 0, -1, 5) == 0 && Count(&a+1, 0, 0, 5) == 0; assert Count(&a+1, 0, 0, 5) == Count(&a+1, 0, -1, 5) + 1; Logic errors. Can you see the contradiction? The real example

Slide 37

Slide 37 text

logic Z Count{L}(int *a, Z m, Z n, int v); axiom CountSectionEmpty: ∀ int *a, v, Z m, n; n ≤ m ⇒ Count(a, m, n, v) == 0; axiom CountSectionHit: ∀ int *a, v, Z n, m; a[n] == v ⇒ Count(a, m, n + 1, v) == Count(a, m, n, v) + 1; int a = 5; assert Count(&a+1, 0, -1, 5) == 0 && Count(&a+1, 0, 0, 5) == 0; assert Count(&a+1, 0, 0, 5) == Count(&a+1, 0, -1, 5) + 1; assert 0 == 1; Logic errors. Can you see the contradiction? The real example

Slide 38

Slide 38 text

• Insert an incorrect assertion. It should not be proved; • Perform a special transformation of a verification condition: • Try to prove it; • Negate. Try to prove it; • If it holds in either case, you’ve got a problem. /*@ requires 0 == 1; ensures \result == 0 && \result == 1 && \result == 2; */ int main(void) { int a = 1; //@ assert 0 == 1; return a / 0; } Logic errors. Can you see the contradiction? How to check yourself