ManLang'17: Lenient Execution of C on a Java Virtual Machine

389c8e3d83119ec458c5c57e8d92da2a?s=47 Manuel Rigger
September 27, 2017

ManLang'17: Lenient Execution of C on a Java Virtual Machine

Talk at the 14th International Conference on Managed Languages & Runtimes (http://d3s.mff.cuni.cz/conferences/manlang17/)

389c8e3d83119ec458c5c57e8d92da2a?s=128

Manuel Rigger

September 27, 2017
Tweet

Transcript

  1. Lenient Execution of C on a Java Virtual Machine Manuel

    Rigger1, Roland Schatz2, Matthias Grimmer2, Hanspeter Mössenböck1 ManLang, September 27, 2017 1Johannes Kepler University Linz 2Oracle Labs
  2. Background: Undefined Behavior in C 2 int* number = malloc(sizeof(int));

  3. Background: Undefined Behavior in C 3 int* number = malloc(sizeof(int));

    *number = 5;
  4. Background: Undefined Behavior in C 4 int* number = malloc(sizeof(int));

    *number = 5; free(number);
  5. Background: Undefined Behavior in C 5 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number);
  6. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); UB
  7. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 UB prints
  8. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 UB prints
  9. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 UB prints
  10. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / UB prints
  11. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints
  12. Background: Undefined Behavior (UB) in C 6 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number); 5 0 ManLang’17 rm –rf / ✓ UB prints UB renders the whole program invalid!
  13. Problems with Undefined Behavior • Exploited by compiler optimizations •

    Different behavior between –O0 and –O3 • Time bombs • Memory errors  exploitable by attackers • Behavior differs between platforms 7
  14. Is Addressing UB Relevant? • 6/9 of SPEC CINT 2006

    benchmarks contain undefined integer operations (Dietz 2012) • 40% of the Debian packages contain unstable code (Wang 2013) • Experts are unclear about many UB aspects (Memarian 2016) 8
  15. Our Approach Provide a user-friendly environment for C programmers on

    the JVM 9
  16. Why the JVM? • Clear semantics for Java bytecodes •

    Useful services/features • Garbage collection • Automatic checks  memory and type safety 10
  17. Our Approach 11 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  18. Our Approach 12 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  19. Our Approach 13 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  20. Our Approach 14 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  21. Our Approach 15 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  22. Our Approach 16 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  23. Our Approach 17 Clang program.c LLVM IR Truffle Java Virtual

    Machine LLVM IR Interpreter compile to runs on Graal compiler libc.c
  24. What to do about Undefined Behavior? 18

  25. What to do about Undefined Behavior? 18 Terminate the program

  26. What to do about Undefined Behavior? 18 Terminate the program

    Continue execution
  27. What to do about Undefined Behavior? 19 Terminate the program

  28. What to do about Undefined Behavior? 20 int* number =

    malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);
  29. What to do about Undefined Behavior? 20 UNDEFINED BEHAVIOR …

    according to §2 7.22.3.3 of ISO/IEC 9899:2011 int* number = malloc(sizeof(int)); *number = 5; free(number); printf("%d\n", *number);
  30. Object Hierarchy in Safe Sulong 21 ManagedObject Address pointee: ManagedObject

    offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[]
  31. Memory Allocation in Safe Sulong 22 int* number = malloc(sizeof(int));

    Address offset = 0 pointee I32 value = null
  32. Memory Allocation in Safe Sulong 23 int* number = malloc(sizeof(int));

    *number = 5; Address offset = 0 pointee I32 value Integer value = 5
  33. Memory Allocation in Safe Sulong 24 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value = null
  34. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null
  35. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null NullPointerException causes
  36. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes
  37. Memory Allocation in Safe Sulong 25 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value = null causes _________ /_ ___ \ /@ \/@ \ \ \__/\___/ / \_\/______/ / /\\\\\ | |\\\\\\ \ \\\\\\\ \______/\\\\\ _||_||_ UNDEFINED BEHAVIOR ...according to §2 7.22.3.3 of ISO/IEC 9899:2011
  38. Expectations 26 Project maintainers can fix the bugs

  39. Reality 27 Programmers do not always care about or understand

    the bugs
  40. 28 https://github.com/aidan-n/prime-number-generator/pull/1

  41. What could go wrong? 29

  42. What could go wrong? 29 If a side effect on

    a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. §2 6.5
  43. UB in the Linux Kernel 30 For the kernel, we

    already really ignore some of the more idiotic C standard rules that introduce pointless undefined behavior: things like the strict aliasing rules are just insane, and the "overflow is u[nd]efined" is bad too. So we use -fno-strict-aliasing -fno-strict-overflow -fno-delete-null-pointer-checks to basically say "those optimizations are fundamentally stupid and wrong, and only encourage compilers to generate random code that doesn't actually match the source code". Linus 2017
  44. What to do about Undefined Behavior? 31 Continue execution

  45. Be Lenient with the User 32

  46. Be Lenient with the User 32 The responsibility of tolerance

    lies with those who have the wider vision. – George Eliot
  47. Be Lenient with the User 32 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 The responsibility of tolerance lies with those who have the wider vision. – George Eliot prints
  48. Lenient C • 26 rules that supersede the C11 standard

    • Restricted to the core language • Goals • Eliminate UB from the language • Do what the user might expect • Benefit from the JVM/Java semantics 33
  49. Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5
  50. Memory Allocation in Safe Sulong 34 int* number = malloc(sizeof(int));

    *number = 5; free(number); Address offset = 0 pointee I32 value Integer value = 5
  51. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); Address offset = 0 pointee I32 value Integer value = 5
  52. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints
  53. Memory Allocation in Safe Sulong 35 int* number = malloc(sizeof(int));

    *number = 5; free(number); printf("%d\n", *number); 5 Address offset = 0 pointee I32 value Integer value = 5 prints The GC will collect the object if it is no longer referenced
  54. Advantages Programmer • Is not required to “fix” the code

    • Easier-understandable C rules Employment • Can run the code 36
  55. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); UB
  56. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); Most compilers UB
  57. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB
  58. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB But: integer overflow is not handled consistenly
  59. Example: Signed Integer Overflow 37 int a = 1, b

    = INT_MAX; int val = a + b; printf("%d\n", val); mov edi, .L.str mov esi, -2147483648 call printf Most compilers UB Lenient C: Signed integer overflow as wraparound semantics (-fno-strict-overflow)
  60. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); UB
  61. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? UB mov edi, .L.str mov esi, <some value> call printf
  62. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 UB mov edi, .L.str mov esi, <some value> call printf
  63. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0
  64. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0
  65. Example: Shifts 38 int a = 1, b = 63;

    int val = a << b; printf("%d\n", val); Consensus is impossible! -O3 ??? x86 2147483648 PowerPC UB mov edi, .L.str mov esi, <some value> call printf 0 Lenient C: Invalid shift values have x86 shift semantics
  66. Example: Relational Operators 39 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; }
  67. Example: Relational Operators 40 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  68. Example: Relational Operators 41 void* memmove(void *dest , void const

    *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  69. Example: Relational Operators 41 Lenient C: Pointers are compared using

    their integer representation void* memmove(void *dest , void const *src , size_t n ) { char *dp = dest; char const *sp = src; if (dp < sp) { while (n-- > 0) *dp++ = *sp++; } else { dp += n; sp += n; while (n-- > 0) *--dp = *--sp ; } return dest; } Using < to compare pointers to two different objects is UB! UB
  70. Example: Relational Operators 42 a: Address offset pointee b: Address

    offset pointee < integer_rep(a) < integer_rep(b)
  71. Integer Representation 43 integer_rep(a) = a.offset

  72. Integer Representation 43 integer_rep(a) = a.offset✓

  73. Integer Representation 43 integer_rep(a) = a.offset✓ … what about pointers

    to different objects? 
  74. Integer Representation 44 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 |

    offset
  75. Integer Representation 44 integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 |

    offset Breaks antisymmetry as different objects might have the same hash code! 
  76. Integer Representation 45 integer_rep(a) = a.id

  77. Integer Representation 45 integer_rep(a) = a.id Need to assign distinct

    IDs ☺
  78. Example: Side Effects 46 void func(int *a) { printf("%d", *a);

    if (a == NULL) { abort(); } }
  79. Example: Side Effects 46 void func(int *a) { printf("%d", *a);

    if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf GCC –O2
  80. Example: Side Effects 46 void func(int *a) { printf("%d", *a);

    if (a == NULL) { abort(); } } func(int*): mov esi, DWORD PTR [rdi] xor eax, eax mov edi, OFFSET FLAT:.LC0 jmp printf Lenient C: NULL-dereferences (-fno- delete-null-pointer-checks), division by 0, etc. have visible side effects GCC –O2
  81. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47
  82. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 prints
  83. int func(int *a , long *b) { *a = 5;

    *b = 8; return *a; } int main(){ long val = 3; printf("%d\n", func((int*) &val, &val)); } Examples: Strict-aliasing rules 47 5 func(int*, long*): # @func(int*, long*) mov dword ptr [rdi], 5 mov qword ptr [rsi], 8 mov eax, 5 ret Clang or GCC –O2 prints
  84. Examples: Strict-aliasing rules 48 Lenient C: Pointers can be dereferenced

    using any type (-fno-strict-aliasing)
  85. Examples: Strict-aliasing rules 48 a: Address offset pointee Long value

    = 5 b: Address offset pointee Lenient C: Pointers can be dereferenced using any type (-fno-strict-aliasing)
  86. Future Work • Extend Lenient C to cover multithreading, compile-time

    issues, library functions, … • Which parts of Lenient C apply to static compilers? 49
  87. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 50
  88. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 51 • Boring compiler for crypto software • Clear semantics • No concrete proposal
  89. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 52 • Replaces many occurrences of “X has undefined behavior” with “X results in an unspecified value” • Addresses 14 points
  90. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 53 • C* specifies language elements according to the hardware features • Behavior might be different for every platform
  91. Related work • Boringcc by Bernstein • Friendly C by

    Cuoq et al. • C* by Ertl • C-like languages 54 • Cyclone • Polymorphic C • CCured  Not source-compatible
  92. Conclusion Bug-finding mode to detect UB 55 Managed Execution on

    the JVM ManagedObject Address pointee: ManagedObject offset: int I32Array values: int[] Function id: long I32 value: int DoubleArray values: double[] Lenient C as a user-friendly C dialect
  93. Bibliography • Linus 2017: Linus Torvalds, https://lkml.org/lkml/2017/7/5/486 • Dietz 2012:

    Will Dietz, Peng Li, John Regehr, and Vikram Adve. 2012. Understanding integer overflow in C/C++. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). • Wang 2013: Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. 2013. Towards optimization-safe systems: analyzing the impact of undefined behavior. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). • Memarian 2016: Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the depths of C: elaborating the de facto standards. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ‘16) 56
  94. Images • https://www.timesheets.com/blog/2010/08/how-to-make-an-honest-hard-working- employee/ • http://thefeedingdoctor.com/hey-servers-leave-those-kids-alone-follow-up-and-part- 2/kid-confused/ • https://shawneemissionpost.com/2016/08/26/bird-attacks-on-joggers-in-nejc-most- likely-attributed-to-great-horned-owl-recent-incident-this-week-in-mission-hills-53628

    • https://upload.wikimedia.org/wikipedia/commons/0/09/Steph_Davis_wingsuit_BASE_br ento.jpg • https://commons.wikimedia.org/wiki/File:Programmer_writing_code_with_Unit_Tests.jp g • https://commons.wikimedia.org/wiki/File:Server-multiple.svg • https://commons.wikimedia.org/wiki/Category:Under_construction_icons#/media/File: UnderCon_icon_black.svg 57