Keeping Safe While Being on the Edge

Keeping Safe While Being on the Edge

Software systems are constantly evolving, with new versions being released on a continuous basis. Unfortunately, software updates present a difficult challenge; the hassle involved in updating software—the fact you have to stop the application to upgrade it—combined with the fear that an update will introduce new bugs, means that many users simply do not do it, leaving their computers exposed to crash-prone, insecure code. In this talk, I will introduce a radically new approach that makes updating software less error-prone and disruptive, by making use of idle cores in multicore machines.

Be1c8a24b76f8b2b23f53eb22d401810?s=128

Imperial ACM

March 21, 2014
Tweet

Transcript

  1. KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr

    Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship
  2. KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr

    Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship /ˈpiːtər/ /ˈhɒʃɛk/
  3. The fundamental problem with program maintenance is that fixing a

    defect has a substantial (20*-50%) chance of introducing another. So the whole process is two steps forward and one step back. —F. Brooks, 1975 2 Yin, Z., Yuan, D., Zhou, Y., Pasupathy, S., and Bairavasundaram, L. How Do Fixes Become Bugs? ESEC/FSE’11 *More than 14.8~24.4% for major operating system patches “ The Mythical Man-Month
  4. Software updates often present a high risk Many admins (70%)

    and users refuse to upgrade software Reliance on outdated versions flawed with vulnerabilities 3 Crameri, O., Knezevic, N., Kostic, D., Bianchini, R., Zwaenepoel, W. Staged deployment in Mirage, an integrated software upgrade testing and distribution system. SOSP’07 Motivation
  5. 4 2009 2010 01 02 03 04 05 06 07

    08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  6. 4 2009 2010 for (h = 0, i = 0;

    i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); HTTP ETag hash value computation in etag_mutate 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  7. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  8. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  9. 4 2009 2010 HTTP ETag hash value computation in etag_mutate

    for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); etag_mutate(con->physical.etag, srv->tmp_buf); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  10. if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP

    ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11
  11. if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP

    ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11 04 05 06 07 08 09 10 11 12 01 02 03 04
  12. Improve the execution of upgraded software to provide: Benefits of

    the newer version Stability of the older version 5 Goal
  13. 6 Multi-core CPU becoming a standard Abundance of resources and

    a high degree of parallelism with no benefit to inherently sequential applications Cadar, C., Pietzuch P., Wolf, A. L. Multiplicity computing: A vision of software engineering for next-generation computing platform applications. FoSER’10 2 4 6 8 10 ‘05 ‘06 ‘07 ‘08 ‘09 ‘10 ‘11 ‘12 ‘13 Number of CPU Cores Market Release Date Intel Xeon ARM Cortex Intel Core
  14. Multi-version execution based approach Run the new version in parallel

    with the existing one Synchronise the execution of the two versions Use output of correctly executing version at any given time Can be extended to work with multiple versions 7 Cadar, C. Hosek, P. Multi-version Software Updates. HotSWUp’12
  15. 8 Synchronisation possible at multiple levels of abstraction Total Synchronisation

    Uncoordinated Execution
  16. 8 Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total

    Synchronisation Uncoordinated Execution
  17. 8 Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total

    Synchronisation Uncoordinated Execution Instructions
  18. 8 Function Calls Inputs/Outputs Synchronisation possible at multiple levels of

    abstraction Total Synchronisation Uncoordinated Execution Instructions
  19. 8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple

    levels of abstraction Total Synchronisation Uncoordinated Execution Instructions
  20. 8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple

    levels of abstraction Total Synchronisation Uncoordinated Execution Instructions
  21. VERSION 1 void fib(int n) { int f[n+1]; f[1] =

    f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Tested with both implementations
  22. VERSION 1 void fib(int n) { int f[n+1]; f[1] =

    f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } VERSION 1 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 VERSION 2 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Snippet of system call trace Obtained using the strace tool Snippet of system call trace Obtained using the strace tool Tested with both implementations
  23. 0 0.25 0.5 0.75 1 2379 2393 2411 2432 2473

    2494 2517 2546 2578 2599 2621 2635 difference (normalised) lighttpd Subversion revision Taken on Linux kernel 2.6.40 and glibc 2.14 using strace tool and custom post-processing (details in the paper) Measured using lighttpd regression suite on 164 revisions External behaviour evolves sporadically 95% of revisions introduce no change 10 Traces Source code
  24. Implementation of multi-version execution for x86 Linux: Combines binary static

    analysis, lightweight checkpointing and runtime code patching Completely transparent, runs on unmodified binaries Runs two versions with small differences in behaviour Focus on application crashes and recovery 11 Mx Hosek, P., Cadar, C. Safe Software Updates via Multi-version Execution. ICSE’13
  25. 12 Mx architecture MULTI-VERSION APPLICATION CONVENTIONAL APPLICATION Mx Execution Environment

    OPERATING SYSTEM SYSTEM CALL INTERPOSITION STATIC ANALYSIS RUNTIME MANIPULATION LINUX KERNEL
  26. 13 Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23

  27. 13 Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation

    Compare individual system calls and their arguments
  28. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and

    fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments
  29. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and

    fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  30. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  31. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process
  32. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version
  33. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution
  34. 13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h

    = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution
  35. Recovery considered successful if versions exhibit the same externally observable

    behaviour after recovery: Assumes small bug propagation distance Crashes are the only type of observable divergences The non-crashing version used as an oracle If unrecoverable, continue with the non-crashing version 14 Assumptions
  36. Suitable for different type of changes and applications: Changes which

    do not affect memory layout e.g., refactorings, security patches Applications which provide synchronisation points e.g., servers structured around the main dispatch loop Where reliability is more important than performance e.g., interactive apps, some server scenarios 15
  37. robj *o = lookupKeyRead(c->db, c->argv[1]); if (o == NULL) {

    addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { addReply(c,shared.nullbulk); } return; } else { if (o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); return; } } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); 16 Redis regression bug #344 introduced during refactoring In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications
  38. 16 Redis regression bug #344 introduced during refactoring robj *o,

    *value; o = lookupKeyRead(c->db,c->argv[1]); if (o != NULL && o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { if (o != NULL && (value = hashGet(o,c->argv[i])) != NULL) { addReplyBulk(c,value); decrRefCount(value); } else { addReply(c,shared.nullbulk); } } In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications Missing return statement
  39. 0 0.5 1 1.5 2 400.PERLBENCH 401.BZIP2 403.GCC 429.MCF 445.GOBMK

    456.HMMER 458.SJENG 462.LIBQUANTUM 464.H264REF 471.OMNETPP 473.ASTAR 483.XALANCBMK 410.BWAVES 416.GAMES 433.MILC 434.ZEUSMP 435.GROMACS 436.CACTUSADM 437.LESLIE3D 444.NAMD 447.DEALII 450.SOPLEX 453.POVRAY 454.CALCULIX 459.GEMSFDTD 465.TONTO 470.LBM 481.WRF 482.SPHINX3 execution time (normalised) 17.91% overhead on SPEC CPU2006 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using SPEC CPU2006 1.2 17 Native Mx over single version despite 2x utilisation cost
  40. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  41. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  42. Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB

    <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9
  43. New implementation based on event streaming: Lightweight distributed, multi-process and

    highly parallel runtime for multi-version execution Running multiple versions (n ≥ 2) at the same time Focus on minimising performance overhead System call binary rewriting instead of ptrace Tolerance to certain system call divergences 19 Nx: “The New Mx”
  44. 20 System call synchronisation possible at different phases

  45. 20 System call synchronisation possible at different phases Every system

    call
  46. 20 Record-replay System call synchronisation possible at different phases Every

    system call
  47. 20 Record-replay System call synchronisation possible at different phases Every

    system call Event streaming
  48. 20 Record-replay System call synchronisation possible at different phases Every

    system call Mx Event streaming
  49. 20 Record-replay System call synchronisation possible at different phases Every

    system call Mx Event streaming Nx
  50. REDIS read(6, “PING\r\n”, 1024) 21 Snippet of system call trace

    Obtained using the strace tool
  51. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  52. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  53. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  54. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  55. 21 Snippet of system call trace Obtained using the strace

    tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool
  56. VMA VMA 21 Snippet of system call trace Obtained using

    the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool Synchronisation Transferring data between process address spaces
  57. 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>:

    2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> Snippet of instruction code
  58. 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>:

    2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> NX NX NX 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump
  59. VMA 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef

    <__libc_read>: 2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 0x4050f0 <anetRead>: 405130: callq <read@plt> NX NX NX 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 0x13cd0 <syscall_enter>: 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump
  60. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER COORDINATOR

  61. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION FOLLOWER COORDINATOR

  62. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION FOLLOWER COORDINATOR Crash

    Segmentation fault
  63. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash

    Segmentation fault Failover Elect new leader after the crash to continue
  64. 23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash

    Segmentation fault Divergence Coalesce events to execute over new code Failover Elect new leader after the crash to continue
  65. 0 0.5 1 1.5 2 1 2 3 4 5

    6 7 8 9 10 performance overhead Taken on 3.40 GHz Intel Core i7-2600 with 8 GB of RAM, Linux kernel 3.11.0 Measured using redis-benchmark 24 Nx ≤30% overhead on Redis benchmark increases sub-linearly with number of versions
  66. Support for more complex code changes: Data structure inference &

    excavation Control flow graph isomorphisms Call stack reconstruction Support for non-crashing type of divergences: Infinite loops and deadlocks 25 Future Work
  67. Novel approach for improving software updates: Multi-version execution based approach

    Relying on abundance of resources to improve reliability Run the new version in parallel with the existing one Synchronise the execution of the versions Use output of correctly executing version 26 Summary
  68. Distinct code bases, manually-generated N-version programming: A fault-tolerance approach to

    reliability of software operation Chen, L., and Avizienis, A. FTCS’78 Using replicated execution for a more secure and reliable web browser Xue, H., Dautenhahn, N., and King, S. T. NDSS’12 Variants of the same code, automatically generated N-variant systems: a secretless framework for security through diversity Cox, B., Evans, D., Filipi, A., Rowanhill, J., Hu, W., Davidson, J., Knight, J., Nguyen-Tuong, A., and Hiser, J. USENIX Security’06 Run-time defense against code injection attacks using replicated execution Salamat, B., Jackson, T., Wagner, G., Wimmer, C., and Franz, M. IEEE Transactions 2011 Online validation of different manually-evolved versions Efficient online validation with delta execution Tucek, J., Xiong, W., Zhou, Y. ASPLOS’09 Tachyon: Tandem Execution for Efficient Live Patch Testing Maurer, M., Brumley, D. USENIX Security’12 27