Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keeping Safe While Being on the Edge

Keeping Safe While Being on the Edge

Software systems are constantly evolving, with new versions being released on a continuous basis. Unfortunately, software updates present a difficult challenge; the hassle involved in updating software—the fact you have to stop the application to upgrade it—combined with the fear that an update will introduce new bugs, means that many users simply do not do it, leaving their computers exposed to crash-prone, insecure code. In this talk, I will introduce a radically new approach that makes updating software less error-prone and disruptive, by making use of idle cores in multicore machines.

Imperial ACM

March 21, 2014
Tweet

More Decks by Imperial ACM

Other Decks in Research

Transcript

  1. KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr

    Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship
  2. KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr

    Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship /ˈpiːtər/ /ˈhɒʃɛk/
  3. The fundamental problem with program maintenance is that fixing a

    defect has a substantial (20*-50%) chance of introducing another. So the whole process is two steps forward and one step back. —F. Brooks, 1975 2 Yin, Z., Yuan, D., Zhou, Y., Pasupathy, S., and Bairavasundaram, L. How Do Fixes Become Bugs? ESEC/FSE’11 *More than 14.8~24.4% for major operating system patches “ The Mythical Man-Month
  4. Software updates often present a high risk Many admins (70%)

    and users refuse to upgrade software Reliance on outdated versions flawed with vulnerabilities 3 Crameri, O., Knezevic, N., Kostic, D., Bianchini, R., Zwaenepoel, W. Staged deployment in Mirage, an integrated software upgrade testing and distribution system. SOSP’07 Motivation
  5. 4 2009 2010 01 02 03 04 05 06 07

    08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  6. 4 2009 2010 for (h = 0, i = 0;

    i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); HTTP ETag hash value computation in etag_mutate 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  7. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  8. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  9. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); etag_mutate(con->physical.etag, srv->tmp_buf); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  10. if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP

    ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  11. if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP

    ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11 04 05 06 07 08 09 10 11 12 01 02 03 04
  12. Improve the execution of upgraded software to provide: Benefits of

    the newer version Stability of the older version 5 Goal
  13. 6 Multi-core CPU becoming a standard Abundance of resources and

    a high degree of parallelism with no benefit to inherently sequential applications Cadar, C., Pietzuch P., Wolf, A. L. Multiplicity computing: A vision of software engineering for next-generation computing platform applications. FoSER’10 2 4 6 8 10 ‘05 ‘06 ‘07 ‘08 ‘09 ‘10 ‘11 ‘12 ‘13 Number of CPU Cores Market Release Date Intel Xeon ARM Cortex Intel Core
  14. Multi-version execution based approach Run the new version in parallel

    with the existing one Synchronise the execution of the two versions Use output of correctly executing version at any given time Can be extended to work with multiple versions 7 Cadar, C. Hosek, P. Multi-version Software Updates. HotSWUp’12
  15. 8 Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total

    Synchronisation Uncoordinated Execution Instructions
  16. 8 Function Calls Inputs/Outputs Synchronisation possible at multiple levels of

    abstraction Total Synchronisation Uncoordinated Execution Instructions
  17. 8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple

    levels of abstraction Total Synchronisation Uncoordinated Execution Instructions
  18. 8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple

    levels of abstraction Total Synchronisation Uncoordinated Execution Instructions
  19. VERSION 1 void fib(int n) { int f[n+1]; f[1] =

    f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Tested with both implementations
  20. VERSION 1 void fib(int n) { int f[n+1]; f[1] =

    f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } VERSION 1 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 VERSION 2 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Snippet of system call trace Obtained using the strace tool Snippet of system call trace Obtained using the strace tool Tested with both implementations
  21. 0 0.25 0.5 0.75 1 2379 2393 2411 2432 2473

    2494 2517 2546 2578 2599 2621 2635 difference (normalised) lighttpd Subversion revision Taken on Linux kernel 2.6.40 and glibc 2.14 using strace tool and custom post-processing (details in the paper) Measured using lighttpd regression suite on 164 revisions External behaviour evolves sporadically 95% of revisions introduce no change 10 Traces Source code
  22. Implementation of multi-version execution for x86 Linux: Combines binary static

    analysis, lightweight checkpointing and runtime code patching Completely transparent, runs on unmodified binaries Runs two versions with small differences in behaviour Focus on application crashes and recovery 11 Mx Hosek, P., Cadar, C. Safe Software Updates via Multi-version Execution. ICSE’13
  23. 12 Mx architecture MULTI-VERSION APPLICATION CONVENTIONAL APPLICATION Mx Execution Environment

    OPERATING SYSTEM SYSTEM CALL INTERPOSITION STATIC ANALYSIS RUNTIME MANIPULATION LINUX KERNEL
  24. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and

    fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments
  25. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and

    fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  26. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  27. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  28. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version
  29. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution
  30. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution
  31. Recovery considered successful if versions exhibit the same externally observable

    behaviour after recovery: Assumes small bug propagation distance Crashes are the only type of observable divergences The non-crashing version used as an oracle If unrecoverable, continue with the non-crashing version 14 Assumptions
  32. Suitable for different type of changes and applications: Changes which

    do not affect memory layout e.g., refactorings, security patches Applications which provide synchronisation points e.g., servers structured around the main dispatch loop Where reliability is more important than performance e.g., interactive apps, some server scenarios 15
  33. robj *o = lookupKeyRead(c->db, c->argv[1]); if (o == NULL) {

    addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { addReply(c,shared.nullbulk); } return; } else { if (o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); return; } } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); 16 Redis regression bug #344 introduced during refactoring In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications
  34. 16 Redis regression bug #344 introduced during refactoring robj *o,

    *value; o = lookupKeyRead(c->db,c->argv[1]); if (o != NULL && o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { if (o != NULL && (value = hashGet(o,c->argv[i])) != NULL) { addReplyBulk(c,value); decrRefCount(value); } else { addReply(c,shared.nullbulk); } } In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications Missing return statement
  35. 0 0.5 1 1.5 2 400.PERLBENCH 401.BZIP2 403.GCC 429.MCF 445.GOBMK

    456.HMMER 458.SJENG 462.LIBQUANTUM 464.H264REF 471.OMNETPP 473.ASTAR 483.XALANCBMK 410.BWAVES 416.GAMES 433.MILC 434.ZEUSMP 435.GROMACS 436.CACTUSADM 437.LESLIE3D 444.NAMD 447.DEALII 450.SOPLEX 453.POVRAY 454.CALCULIX 459.GEMSFDTD 465.TONTO 470.LBM 481.WRF 482.SPHINX3 execution time (normalised) 17.91% overhead on SPEC CPU2006 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using SPEC CPU2006 1.2 17 Native Mx over single version despite 2x utilisation cost
  36. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  37. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  38. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  39. New implementation based on event streaming: Lightweight distributed, multi-process and

    highly parallel runtime for multi-version execution Running multiple versions (n ≥ 2) at the same time Focus on minimising performance overhead System call binary rewriting instead of ptrace Tolerance to certain system call divergences 19 Nx: “The New Mx”
  40. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  41. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  42. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  43. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  44. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  45. VMA VMA 21 Snippet of system call trace Obtained using

    the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool Synchronisation Transferring data between process address spaces
  46. 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>:

    2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> Snippet of instruction code
  47. 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>:

    2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> NX NX NX 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump
  48. VMA 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef

    <__libc_read>: 2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> NX NX NX 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump
  49. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash

    Segmentation fault Failover Elect new leader after the crash to continue
  50. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash

    Segmentation fault Divergence Coalesce events to execute over new code Failover Elect new leader after the crash to continue
  51. 0 0.5 1 1.5 2 1 2 3 4 5

    6 7 8 9 10 performance overhead Taken on 3.40 GHz Intel Core i7-2600 with 8 GB of RAM, Linux kernel 3.11.0 Measured using redis-benchmark 24 Nx ≤30% overhead on Redis benchmark increases sub-linearly with number of versions
  52. Support for more complex code changes: Data structure inference &

    excavation Control flow graph isomorphisms Call stack reconstruction Support for non-crashing type of divergences: Infinite loops and deadlocks 25 Future Work
  53. Novel approach for improving software updates: Multi-version execution based approach

    Relying on abundance of resources to improve reliability Run the new version in parallel with the existing one Synchronise the execution of the versions Use output of correctly executing version 26 Summary
  54. Distinct code bases, manually-generated N-version programming: A fault-tolerance approach to

    reliability of software operation Chen, L., and Avizienis, A. FTCS’78 Using replicated execution for a more secure and reliable web browser Xue, H., Dautenhahn, N., and King, S. T. NDSS’12 Variants of the same code, automatically generated N-variant systems: a secretless framework for security through diversity Cox, B., Evans, D., Filipi, A., Rowanhill, J., Hu, W., Davidson, J., Knight, J., Nguyen-Tuong, A., and Hiser, J. USENIX Security’06 Run-time defense against code injection attacks using replicated execution Salamat, B., Jackson, T., Wagner, G., Wimmer, C., and Franz, M. IEEE Transactions 2011 Online validation of different manually-evolved versions Efficient online validation with delta execution Tucek, J., Xiong, W., Zhou, Y. ASPLOS’09 Tachyon: Tandem Execution for Efficient Live Patch Testing Maurer, M., Brumley, D. USENIX Security’12 27