Slide 1

Slide 1 text

Lenient Execution of C on a Java Virtual Machine Manuel Rigger1, Roland Schatz2, Matthias Grimmer2, Hanspeter Mössenböck1 ManLang, September 27, 2017 1Johannes Kepler University Linz 2Oracle Labs

Slide 2

Slide 2 text

Background: Undefined Behavior in C 2 int* number = malloc(sizeof(int));

Slide 3

Slide 3 text

Background: Undefined Behavior in C 3 int* number = malloc(sizeof(int)); *number = 5;

Slide 4

Slide 4 text

Background: Undefined Behavior in C 4 int* number = malloc(sizeof(int)); *number = 5; free(number);

Slide 5

Slide 5 text

Background: Undefined Behavior in C 5 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);

Slide 6

Slide 6 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); UB

Slide 7

Slide 7 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 UB prints

Slide 8

Slide 8 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 UB prints

Slide 9

Slide 9 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 UB prints

Slide 10

Slide 10 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / UB prints

Slide 11

Slide 11 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints

Slide 12

Slide 12 text

Background: Undefined Behavior (UB) in C 6 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints UB renders the whole program invalid!

Slide 13

Slide 13 text

Problems with Undefined Behavior • Exploited by compiler optimizations • Different behavior between –O0 and –O3 • Time bombs • Memory errors  exploitable by attackers • Behavior differs between platforms 7

Slide 14

Slide 14 text

Is Addressing UB Relevant? • 6/9 of SPEC CINT 2006 benchmarks contain undefined integer operations (Dietz 2012) • 40% of the Debian packages contain unstable code (Wang 2013) • Experts are unclear about many UB aspects (Memarian 2016) 8

Slide 15

Slide 15 text

Our Approach Provide a user-friendly environment for C programmers on the JVM 9

Slide 16

Slide 16 text

Why the JVM? • Clear semantics for Java bytecodes • Useful services/features • Garbage collection • Automatic checks  memory and type safety 10

Slide 17

Slide 17 text

Our Approach 11 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 18

Slide 18 text

Our Approach 12 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 19

Slide 19 text

Our Approach 13 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 20

Slide 20 text

Our Approach 14 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 21

Slide 21 text

Our Approach 15 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 22

Slide 22 text

Our Approach 16 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 23

Slide 23 text

Our Approach 17 Clang program.c LLVM IR Truffle Java Virtual Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c

Slide 24

Slide 24 text

What to do about Undefined Behavior? 18

Slide 25

Slide 25 text

What to do about Undefined Behavior? 18 Terminate the program

Slide 26

Slide 26 text

What to do about Undefined Behavior? 18 Terminate the program Continue execution

Slide 27

Slide 27 text

What to do about Undefined Behavior? 19 Terminate the program

Slide 28

Slide 28 text

What to do about Undefined Behavior? 20 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);

Slide 29

Slide 29 text

What to do about Undefined Behavior? 20 UNDEFINED BEHAVIOR … according to §2 7.22.3.3 of ISO/IEC 9899:2011 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);

Slide 30

Slide 30 text

Object Hierarchy in Safe Sulong 21 ManagedObject Address pointee: ManagedObject offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[]

Slide 31

Slide 31 text

Memory Allocation in Safe Sulong 22 int* number = malloc(sizeof(int)); Address offset = 0 pointee I32 value = null

Slide 32

Slide 32 text

Memory Allocation in Safe Sulong 23 int* number = malloc(sizeof(int)); *number = 5; Address offset = 0 pointee I32 value Integer value = 5

Slide 33

Slide 33 text

Memory Allocation in Safe Sulong 24 int* number = malloc(sizeof(int)); *number = 5; free(number); Address offset = 0 pointee I32 value = null

Slide 34

Slide 34 text

Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null

Slide 35

Slide 35 text

Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null NullPointerException causes

Slide 36

Slide 36 text

Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes

Slide 37

Slide 37 text

Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes _________ /_ ___ \ /@ \/@ \ \ \__/\___/ / \_\/______/ / /\\\\\ | |\\\\\\ \ \\\\\\\ \______/\\\\\ _||_||_ UNDEFINED BEHAVIOR ...according to §2 7.22.3.3 of ISO/IEC 9899:2011

Slide 38

Slide 38 text

Expectations 26 Project maintainers can fix the bugs

Slide 39

Slide 39 text

Reality 27 Programmers do not always care about or understand the bugs

Slide 40

Slide 40 text

28 https://github.com/aidan-n/prime-number-generator/pull/1

Slide 41

Slide 41 text

What could go wrong? 29

Slide 42

Slide 42 text

What could go wrong? 29 If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. §2 6.5

Slide 43

Slide 43 text

UB in the Linux Kernel 30 For the kernel, we already really ignore some of the more idiotic C standard rules that introduce pointless undefined behavior: things like the strict aliasing rules are just insane, and the "overflow is u[nd]efined" is bad too. So we use -fno-strict-aliasing -fno-strict-overflow -fno-delete-null-pointer-checks to basically say "those optimizations are fundamentally stupid and wrong, and only encourage compilers to generate random code that doesn't actually match the source code". Linus 2017

Slide 44

Slide 44 text

What to do about Undefined Behavior? 31 Continue execution

Slide 45

Slide 45 text

Be Lenient with the User 32

Slide 46

Slide 46 text

Be Lenient with the User 32 The responsibility of tolerance lies with those who have the wider vision. – George Eliot

Slide 47

Slide 47 text

Be Lenient with the User 32 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 The responsibility of tolerance lies with those who have the wider vision. – George Eliot prints

Slide 48

Slide 48 text

Lenient C • 26 rules that supersede the C11 standard • Restricted to the core language • Goals • Eliminate UB from the language • Do what the user might expect • Benefit from the JVM/Java semantics 33

Slide 49

Slide 49 text

Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int)); *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5

Slide 50

Slide 50 text

Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int)); *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5

Slide 51

Slide 51 text

Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value Integer value = 5

Slide 52

Slide 52 text

Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints

Slide 53

Slide 53 text

Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints The GC will collect the object if it is no longer referenced

Slide 54

Slide 54 text

Advantages Programmer • Is not required to “fix” the code • Easier-understandable C rules Employment • Can run the code 36

Slide 55

Slide 55 text

Example: Signed Integer Overflow 37 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); UB

Slide 56

Slide 56 text

Example: Signed Integer Overflow 37 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); Most compilers UB

Slide 57

Slide 57 text

Example: Signed Integer Overflow 37 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB

Slide 58

Slide 58 text

Example: Signed Integer Overflow 37 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB But: integer overflow is not handled consistenly

Slide 59

Slide 59 text

Example: Signed Integer Overflow 37 int a = 1, b = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB Lenient C: Signed integer overflow as wraparound semantics (-fno-strict-overflow)

Slide 60

Slide 60 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); UB

Slide 61

Slide 61 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); -O3 ??? UB mov edi, .L.str mov esi, call printf

Slide 62

Slide 62 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 UB mov edi, .L.str mov esi, call printf

Slide 63

Slide 63 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, call printf 0

Slide 64

Slide 64 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, call printf 0

Slide 65

Slide 65 text

Example: Shifts 38 int a = 1, b = 63; int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, call printf 0 Lenient C: Invalid shift values have x86 shift semantics

Slide 66

Slide 66 text

Example: Relational Operators 39 void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; }

Slide 67

Slide 67 text

Example: Relational Operators 40 void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB

Slide 68

Slide 68 text

Example: Relational Operators 41 void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB

Slide 69

Slide 69 text

Example: Relational Operators 41 Lenient C: Pointers are compared using their integer representation void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB

Slide 70

Slide 70 text

Example: Relational Operators 42 a: Address offset pointee b: Address offset pointee < integer_rep(a) < integer_rep(b)

Slide 71

Slide 71 text

Integer Representation 43 integer_rep(a) = a.offset

Slide 72

Slide 72 text

Integer Representation 43 integer_rep(a) = a.offset✓

Slide 73

Slide 73 text

Integer Representation 43 integer_rep(a) = a.offset✓ … what about pointers to different objects? 

Slide 74

Slide 74 text

Integer Representation 44 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 | offset

Slide 75

Slide 75 text

Integer Representation 44 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 | offset Breaks antisymmetry as different objects might have the same hash code! 

Slide 76

Slide 76 text

Integer Representation 45 integer_rep(a) = a.id

Slide 77

Slide 77 text

Integer Representation 45 integer_rep(a) = a.id Need to assign distinct IDs ☺

Slide 78

Slide 78 text

Example: Side Effects 46 void func(int *a) { printf("%d", *a); if (a == NULL) { abort(); } }

Slide 79

Slide 79 text

Example: Side Effects 46 void func(int *a) { printf("%d", *a); if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf GCC –O2

Slide 80

Slide 80 text

Example: Side Effects 46 void func(int *a) { printf("%d", *a); if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf Lenient C: NULL-dereferences (-fno- delete-null-pointer-checks), division by 0, etc. have visible side effects GCC –O2

Slide 81

Slide 81 text

int func(int *a , long *b) { *a = 5; *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47

Slide 82

Slide 82 text

int func(int *a , long *b) { *a = 5; *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 prints

Slide 83

Slide 83 text

int func(int *a , long *b) { *a = 5; *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 func(int*, long*): # @func(int*, long*) mov dword ptr [rdi], 5 mov qword ptr [rsi], 8 mov eax, 5 ret Clang or GCC –O2 prints

Slide 84

Slide 84 text

Examples: Strict-aliasing rules 48 Lenient C: Pointers can be dereferenced using any type (-fno-strict-aliasing)

Slide 85

Slide 85 text

Examples: Strict-aliasing rules 48 a: Address offset pointee Long value = 5 b: Address offset pointee Lenient C: Pointers can be dereferenced using any type (-fno-strict-aliasing)

Slide 86

Slide 86 text

Future Work • Extend Lenient C to cover multithreading, compile-time issues, library functions, … • Which parts of Lenient C apply to static compilers? 49

Slide 87

Slide 87 text

Related work • Boringcc by Bernstein • Friendly C by Cuoq et al. • C* by Ertl • C-like languages 50

Slide 88

Slide 88 text

Related work • Boringcc by Bernstein • Friendly C by Cuoq et al. • C* by Ertl • C-like languages 51 • Boring compiler for crypto software • Clear semantics • No concrete proposal

Slide 89

Slide 89 text

Related work • Boringcc by Bernstein • Friendly C by Cuoq et al. • C* by Ertl • C-like languages 52 • Replaces many occurrences of “X has undefined behavior” with “X results in an unspecified value” • Addresses 14 points

Slide 90

Slide 90 text

Related work • Boringcc by Bernstein • Friendly C by Cuoq et al. • C* by Ertl • C-like languages 53 • C* specifies language elements according to the hardware features • Behavior might be different for every platform

Slide 91

Slide 91 text

Related work • Boringcc by Bernstein • Friendly C by Cuoq et al. • C* by Ertl • C-like languages 54 • Cyclone • Polymorphic C • CCured  Not source-compatible

Slide 92

Slide 92 text

Conclusion Bug-finding mode to detect UB 55 Managed Execution on the JVM ManagedObject Address pointee: ManagedObject offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[] Lenient C as a user-friendly C dialect

Slide 93

Slide 93 text

Bibliography • Linus 2017: Linus Torvalds, https://lkml.org/lkml/2017/7/5/486 • Dietz 2012: Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2012. Understanding integer overflow in C/C++. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). • Wang 2013: Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: analyzing the impact of undefined behavior. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). • Memarian 2016: Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the depths of C: elaborating the de facto standards. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ‘16) 56

Slide 94

Slide 94 text

Images • https://www.timesheets.com/blog/2010/08/how-to-make-an-honest-hard-working- employee/ • http://thefeedingdoctor.com/hey-servers-leave-those-kids-alone-follow-up-and-part- 2/kid-confused/ • https://shawneemissionpost.com/2016/08/26/bird-attacks-on-joggers-in-nejc-most- likely-attributed-to-great-horned-owl-recent-incident-this-week-in-mission-hills-53628 • https://upload.wikimedia.org/wikipedia/commons/0/09/Steph_Davis_wingsuit_BASE_br ento.jpg • https://commons.wikimedia.org/wiki/File:Programmer_writing_code_with_Unit_Tests.jp g • https://commons.wikimedia.org/wiki/File:Server-multiple.svg • https://commons.wikimedia.org/wiki/Category:Under_construction_icons#/media/File: UnderCon_icon_black.svg 57