Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ManLang'17: Lenient Execution of C on a Java Virtual Machine

Manuel Rigger
September 27, 2017

ManLang'17: Lenient Execution of C on a Java Virtual Machine

Talk at the 14th International Conference on Managed Languages & Runtimes (http://d3s.mff.cuni.cz/conferences/manlang17/)

Manuel Rigger

September 27, 2017
Tweet

More Decks by Manuel Rigger

Other Decks in Research

Transcript

  1. Lenient Execution of C on a Java Virtual Machine Manuel

    Rigger1, Roland Schatz2, Matthias Grimmer2, Hanspeter Mössenböck1 ManLang, September 27, 2017 1Johannes Kepler University Linz 2Oracle Labs
  2. Background: Undefined Behavior in C 5 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number);
  3. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); UB
  4. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 UB prints
  5. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 UB prints
  6. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 UB prints
  7. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / UB prints
  8. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints
  9. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints UB renders the whole program invalid!
  10. Problems with Undefined Behavior • Exploited by compiler optimizations •

    Different behavior between –O0 and –O3 • Time bombs • Memory errors  exploitable by attackers • Behavior differs between platforms 7
  11. Is Addressing UB Relevant? • 6/9 of SPEC CINT 2006

    benchmarks contain undefined integer operations (Dietz 2012) • 40% of the Debian packages contain unstable code (Wang 2013) • Experts are unclear about many UB aspects (Memarian 2016) 8
  12. Why the JVM? • Clear semantics for Java bytecodes •

    Useful services/features • Garbage collection • Automatic checks  memory and type safety 10
  13. Our Approach 11 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  14. Our Approach 12 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  15. Our Approach 13 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  16. Our Approach 14 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  17. Our Approach 15 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  18. Our Approach 16 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  19. Our Approach 17 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  20. What to do about Undefined Behavior? 20 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);
  21. What to do about Undefined Behavior? 20 UNDEFINED BEHAVIOR …

    according to §2 7.22.3.3 of ISO/IEC 9899:2011 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);
  22. Object Hierarchy in Safe Sulong 21 ManagedObject Address pointee: ManagedObject

    offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[]
  23. Memory Allocation in Safe Sulong 23 int* number = malloc(sizeof(int));

    *number = 5; Address offset = 0 pointee I32 value Integer value = 5
  24. Memory Allocation in Safe Sulong 24 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value = null
  25. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null
  26. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null NullPointerException causes
  27. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes
  28. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes _________ /_ ___ \ /@ \/@ \ \ \__/\___/ / \_\/______/ / /\\\\\ | |\\\\\\ \ \\\\\\\ \______/\\\\\ _||_||_ UNDEFINED BEHAVIOR ...according to §2 7.22.3.3 of ISO/IEC 9899:2011
  29. What could go wrong? 29 If a side effect on

    a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. §2 6.5
  30. UB in the Linux Kernel 30 For the kernel, we

    already really ignore some of the more idiotic C standard rules that introduce pointless undefined behavior: things like the strict aliasing rules are just insane, and the "overflow is u[nd]efined" is bad too. So we use -fno-strict-aliasing -fno-strict-overflow -fno-delete-null-pointer-checks to basically say "those optimizations are fundamentally stupid and wrong, and only encourage compilers to generate random code that doesn't actually match the source code". Linus 2017
  31. Be Lenient with the User 32 The responsibility of tolerance

    lies with those who have the wider vision. – George Eliot
  32. Be Lenient with the User 32 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 The responsibility of tolerance lies with those who have the wider vision. – George Eliot prints
  33. Lenient C • 26 rules that supersede the C11 standard

    • Restricted to the core language • Goals • Eliminate UB from the language • Do what the user might expect • Benefit from the JVM/Java semantics 33
  34. Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5
  35. Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5
  36. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value Integer value = 5
  37. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints
  38. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints The GC will collect the object if it is no longer referenced
  39. Advantages Programmer • Is not required to “fix” the code

    • Easier-understandable C rules Employment • Can run the code 36
  40. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); UB
  41. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); Most compilers UB
  42. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB
  43. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB But: integer overflow is not handled consistenly
  44. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB Lenient C: Signed integer overflow as wraparound semantics (-fno-strict-overflow)
  45. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); UB
  46. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? UB mov edi, .L.str mov esi, <some value> call printf
  47. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 UB mov edi, .L.str mov esi, <some value> call printf
  48. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0
  49. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0
  50. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0 Lenient C: Invalid shift values have x86 shift semantics
  51. Example: Relational Operators 39 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; }
  52. Example: Relational Operators 40 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  53. Example: Relational Operators 41 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  54. Example: Relational Operators 41 Lenient C: Pointers are compared using

    their integer representation void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  55. Example: Relational Operators 42 a: Address offset pointee b: Address

    offset pointee < integer_rep(a) < integer_rep(b)
  56. Integer Representation 44 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 |

    offset Breaks antisymmetry as different objects might have the same hash code! 
  57. Example: Side Effects 46 void func(int *a) { printf("%d", *a);

    if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf GCC –O2
  58. Example: Side Effects 46 void func(int *a) { printf("%d", *a);

    if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf Lenient C: NULL-dereferences (-fno- delete-null-pointer-checks), division by 0, etc. have visible side effects GCC –O2
  59. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47
  60. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 prints
  61. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 func(int*, long*): # @func(int*, long*) mov dword ptr [rdi], 5 mov qword ptr [rsi], 8 mov eax, 5 ret Clang or GCC –O2 prints
  62. Examples: Strict-aliasing rules 48 a: Address offset pointee Long value

    = 5 b: Address offset pointee Lenient C: Pointers can be dereferenced using any type (-fno-strict-aliasing)
  63. Future Work • Extend Lenient C to cover multithreading, compile-time

    issues, library functions, … • Which parts of Lenient C apply to static compilers? 49
  64. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 50
  65. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 51 • Boring compiler for crypto software • Clear semantics • No concrete proposal
  66. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 52 • Replaces many occurrences of “X has undefined behavior” with “X results in an unspecified value” • Addresses 14 points
  67. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 53 • C* specifies language elements according to the hardware features • Behavior might be different for every platform
  68. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 54 • Cyclone • Polymorphic C • CCured  Not source-compatible
  69. Conclusion Bug-finding mode to detect UB 55 Managed Execution on

    the JVM ManagedObject Address pointee: ManagedObject offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[] Lenient C as a user-friendly C dialect
  70. Bibliography • Linus 2017: Linus Torvalds, https://lkml.org/lkml/2017/7/5/486 • Dietz 2012:

    Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2012. Understanding integer overflow in C/C++. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). • Wang 2013: Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: analyzing the impact of undefined behavior. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). • Memarian 2016: Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the depths of C: elaborating the de facto standards. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ‘16) 56
  71. Images • https://www.timesheets.com/blog/2010/08/how-to-make-an-honest-hard-working- employee/ • http://thefeedingdoctor.com/hey-servers-leave-those-kids-alone-follow-up-and-part- 2/kid-confused/ • https://shawneemissionpost.com/2016/08/26/bird-attacks-on-joggers-in-nejc-most- likely-attributed-to-great-horned-owl-recent-incident-this-week-in-mission-hills-53628

    • https://upload.wikimedia.org/wikipedia/commons/0/09/Steph_Davis_wingsuit_BASE_br ento.jpg • https://commons.wikimedia.org/wiki/File:Programmer_writing_code_with_Unit_Tests.jp g • https://commons.wikimedia.org/wiki/File:Server-multiple.svg • https://commons.wikimedia.org/wiki/Category:Under_construction_icons#/media/File: UnderCon_icon_black.svg 57