Slide 1

Slide 1 text

KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship

Slide 2

Slide 2 text

KEEPING SAFE WHILE BEING ON THE EDGE PETR HOŠEK Petr Hosek is a recipient of the Google European Fellowship in Software Engineering and this research is supported in part by this Google Fellowship /ˈpiːtər/ /ˈhɒʃɛk/

Slide 3

Slide 3 text

The fundamental problem with program maintenance is that fixing a defect has a substantial (20*-50%) chance of introducing another. So the whole process is two steps forward and one step back. —F. Brooks, 1975 2 Yin, Z., Yuan, D., Zhou, Y., Pasupathy, S., and Bairavasundaram, L. How Do Fixes Become Bugs? ESEC/FSE’11 *More than 14.8~24.4% for major operating system patches “ The Mythical Man-Month

Slide 4

Slide 4 text

Software updates often present a high risk Many admins (70%) and users refuse to upgrade software Reliance on outdated versions flawed with vulnerabilities 3 Crameri, O., Knezevic, N., Kostic, D., Bianchini, R., Zwaenepoel, W. Staged deployment in Mirage, an integrated software upgrade testing and distribution system. SOSP’07 Motivation

Slide 5

Slide 5 text

4 2009 2010 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 6

Slide 6 text

4 2009 2010 for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); HTTP ETag hash value computation in etag_mutate 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 7

Slide 7 text

4 2009 2010 HTTP ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 8

Slide 8 text

4 2009 2010 HTTP ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 9

Slide 9 text

4 2009 2010 HTTP ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); etag_mutate(con->physical.etag, srv->tmp_buf); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 10

Slide 10 text

if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11

Slide 11

Slide 11 text

if (use_etag) { etag_mutate(con->physical.etag, srv->tmp_buf); } 4 2009 2010 HTTP ETag hash value computation in etag_mutate for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); File (re)compression in mod_compress_physical Bug diagnosed in issue tracker 01 02 03 04 05 06 07 08 09 10 11 12 01 02 03 04 05 06 07 08 09 10 11 12 01 02 12 11 04 05 06 07 08 09 10 11 12 01 02 03 04

Slide 12

Slide 12 text

Improve the execution of upgraded software to provide: Benefits of the newer version Stability of the older version 5 Goal

Slide 13

Slide 13 text

6 Multi-core CPU becoming a standard Abundance of resources and a high degree of parallelism with no benefit to inherently sequential applications Cadar, C., Pietzuch P., Wolf, A. L. Multiplicity computing: A vision of software engineering for next-generation computing platform applications. FoSER’10 2 4 6 8 10 ‘05 ‘06 ‘07 ‘08 ‘09 ‘10 ‘11 ‘12 ‘13 Number of CPU Cores Market Release Date Intel Xeon ARM Cortex Intel Core

Slide 14

Slide 14 text

Multi-version execution based approach Run the new version in parallel with the existing one Synchronise the execution of the two versions Use output of correctly executing version at any given time Can be extended to work with multiple versions 7 Cadar, C. Hosek, P. Multi-version Software Updates. HotSWUp’12

Slide 15

Slide 15 text

8 Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution

Slide 16

Slide 16 text

8 Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution

Slide 17

Slide 17 text

8 Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution Instructions

Slide 18

Slide 18 text

8 Function Calls Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution Instructions

Slide 19

Slide 19 text

8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution Instructions

Slide 20

Slide 20 text

8 Function Calls System Calls Inputs/Outputs Synchronisation possible at multiple levels of abstraction Total Synchronisation Uncoordinated Execution Instructions

Slide 21

Slide 21 text

VERSION 1 void fib(int n) { int f[n+1]; f[1] = f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Tested with both implementations

Slide 22

Slide 22 text

VERSION 1 void fib(int n) { int f[n+1]; f[1] = f[2] = 1; for (int i = 3; i <= n; ++i) f[i] = f[i-1] + f[i-2]; printf(“%d\n”, f[n]); } VERSION 2 void fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; ++i) { int c = a + b; a = b, b = c; } printf(“%d\n”, b); } VERSION 1 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 VERSION 2 write(1, “5\n”, 2) = 2 write(1, “8\n”, 2) = 2 9 int main(int argc, char **argv) { fib(5); fib(6); } Example testing code System calls define external behaviour Snippet of system call trace Obtained using the strace tool Snippet of system call trace Obtained using the strace tool Tested with both implementations

Slide 23

Slide 23 text

0 0.25 0.5 0.75 1 2379 2393 2411 2432 2473 2494 2517 2546 2578 2599 2621 2635 difference (normalised) lighttpd Subversion revision Taken on Linux kernel 2.6.40 and glibc 2.14 using strace tool and custom post-processing (details in the paper) Measured using lighttpd regression suite on 164 revisions External behaviour evolves sporadically 95% of revisions introduce no change 10 Traces Source code

Slide 24

Slide 24 text

Implementation of multi-version execution for x86 Linux: Combines binary static analysis, lightweight checkpointing and runtime code patching Completely transparent, runs on unmodified binaries Runs two versions with small differences in behaviour Focus on application crashes and recovery 11 Mx Hosek, P., Cadar, C. Safe Software Updates via Multi-version Execution. ICSE’13

Slide 25

Slide 25 text

12 Mx architecture MULTI-VERSION APPLICATION CONVENTIONAL APPLICATION Mx Execution Environment OPERATING SYSTEM SYSTEM CALL INTERPOSITION STATIC ANALYSIS RUNTIME MANIPULATION LINUX KERNEL

Slide 26

Slide 26 text

13 Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23

Slide 27

Slide 27 text

13 Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments

Slide 28

Slide 28 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments

Slide 29

Slide 29 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process

Slide 30

Slide 30 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process

Slide 31

Slide 31 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process

Slide 32

Slide 32 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version

Slide 33

Slide 33 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution

Slide 34

Slide 34 text

13 GET /index.html HTTP/1.1 Host: srg.doc.ic.ac.uk Accept-Encoding: gzip for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used - 1; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); for (h = 0, i = 0; i < etag->used; ++i) h = (h << 5) ^ (h >> 27) ^ (etag->ptr[i]); Synchronisation and fail-recovery mechanism LIGHTTPD 1.4.22 LIGHTTPD 1.4.23 Crash Segmentation fault Synchronisation Compare individual system calls and their arguments Checkpointing Use clone to take a snapshot of a process Failure recovery Restart the snapshot and replace the code with the code of the new version Reconvergence Return to the original code and continue execution

Slide 35

Slide 35 text

Recovery considered successful if versions exhibit the same externally observable behaviour after recovery: Assumes small bug propagation distance Crashes are the only type of observable divergences The non-crashing version used as an oracle If unrecoverable, continue with the non-crashing version 14 Assumptions

Slide 36

Slide 36 text

Suitable for different type of changes and applications: Changes which do not affect memory layout e.g., refactorings, security patches Applications which provide synchronisation points e.g., servers structured around the main dispatch loop Where reliability is more important than performance e.g., interactive apps, some server scenarios 15

Slide 37

Slide 37 text

robj *o = lookupKeyRead(c->db, c->argv[1]); if (o == NULL) { addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { addReply(c,shared.nullbulk); } return; } else { if (o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); return; } } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); 16 Redis regression bug #344 introduced during refactoring In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications

Slide 38

Slide 38 text

16 Redis regression bug #344 introduced during refactoring robj *o, *value; o = lookupKeyRead(c->db,c->argv[1]); if (o != NULL && o->type != REDIS_HASH) { addReply(c,shared.wrongtypeerr); } addReplySds(c,sdscatprintf(sdsempty(), "*%d\r\n",c->argc-2)); for (i = 2; i < c->argc; i++) { if (o != NULL && (value = hashGet(o,c->argv[i])) != NULL) { addReplyBulk(c,value); decrRefCount(value); } else { addReply(c,shared.nullbulk); } } In-memory NoSQL database HMGET command implementation in hmgetCommand function Survived a number of crash bugs in several popular server applications Missing return statement

Slide 39

Slide 39 text

0 0.5 1 1.5 2 400.PERLBENCH 401.BZIP2 403.GCC 429.MCF 445.GOBMK 456.HMMER 458.SJENG 462.LIBQUANTUM 464.H264REF 471.OMNETPP 473.ASTAR 483.XALANCBMK 410.BWAVES 416.GAMES 433.MILC 434.ZEUSMP 435.GROMACS 436.CACTUSADM 437.LESLIE3D 444.NAMD 447.DEALII 450.SOPLEX 453.POVRAY 454.CALCULIX 459.GEMSFDTD 465.TONTO 470.LBM 481.WRF 482.SPHINX3 execution time (normalised) 17.91% overhead on SPEC CPU2006 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using SPEC CPU2006 1.2 17 Native Mx over single version despite 2x utilisation cost

Slide 40

Slide 40 text

Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9

Slide 41

Slide 41 text

Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9

Slide 42

Slide 42 text

Interactive applications: 18 UTILITY INPUT SIZE OVERHEAD md5sum sha1sum <1.25MB <100ms (imperceptible) mkdir mkfifo mknod <115 nested directories <100ms (imperceptible) cut <1.10MB <100ms (imperceptible) APPLICATION SCENARIO OVERHEAD lighttpd localhost/network 2.60x – 3.49x lighttpd distant networks 1.01x – 1.04x redis localhost/network 3.74x – 16.72x redis distant networks 1.00x – 1.05x Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9 Measured using redis-benchmark and http_load Server applications: Measured using Coreutils 6.10 Taken on 3.50 GHz Intel Xeon E3 1280 with 16 GB of RAM, Linux kernel 3.1.9

Slide 43

Slide 43 text

New implementation based on event streaming: Lightweight distributed, multi-process and highly parallel runtime for multi-version execution Running multiple versions (n ≥ 2) at the same time Focus on minimising performance overhead System call binary rewriting instead of ptrace Tolerance to certain system call divergences 19 Nx: “The New Mx”

Slide 44

Slide 44 text

20 System call synchronisation possible at different phases

Slide 45

Slide 45 text

20 System call synchronisation possible at different phases Every system call

Slide 46

Slide 46 text

20 Record-replay System call synchronisation possible at different phases Every system call

Slide 47

Slide 47 text

20 Record-replay System call synchronisation possible at different phases Every system call Event streaming

Slide 48

Slide 48 text

20 Record-replay System call synchronisation possible at different phases Every system call Mx Event streaming

Slide 49

Slide 49 text

20 Record-replay System call synchronisation possible at different phases Every system call Mx Event streaming Nx

Slide 50

Slide 50 text

REDIS read(6, “PING\r\n”, 1024) 21 Snippet of system call trace Obtained using the strace tool

Slide 51

Slide 51 text

21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool

Slide 52

Slide 52 text

21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool

Slide 53

Slide 53 text

21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool

Slide 54

Slide 54 text

21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool

Slide 55

Slide 55 text

21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool

Slide 56

Slide 56 text

VMA VMA 21 Snippet of system call trace Obtained using the strace tool MX ptrace(PTRACE_GETREGS, 7, {...}, NULL) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) read(8, “PING\r\n”, 1024) ptrace(PTRACE_GETREGS, 7, {...}, NULL) process_vm_writev(7, {?}, 1, {?}, 1, 0) ptrace(PTRACE_SETREGS, 7, {...}, {...}) ptrace(PTRACE_SYSCALL, 7, {...}, NULL) REDIS --- SIGTRAP --- getpid() --- SIGTRAP --- Snippet of system call trace Obtained using the strace tool Synchronisation Transferring data between process address spaces

Slide 57

Slide 57 text

22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 : 0x4050f0 : 0x4050f0 : 405130: callq Snippet of instruction code

Slide 58

Slide 58 text

22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 : 0x4050f0 : 0x4050f0 : 405130: callq NX NX NX 0x13cd0 : 0x13cd0 : 0x13cd0 : 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump

Slide 59

Slide 59 text

VMA 22 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: mov $0x0,%eax 2f: syscall REDIS REDIS REDIS 0x4050f0 : 0x4050f0 : 0x4050f0 : 405130: callq NX NX NX 0x13cd0 : 0x13cd0 : 0x13cd0 : 13d31: cmp $0x1,%r10 13d3a: callq *%r10 GLIBC GLIBC GLIBC 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 0xdeadbeef <__libc_read>: 2a: jmpq $0x13cd0 Snippet of instruction code Binary Rewriting Replace syscall instruction with a direct jump

Slide 60

Slide 60 text

23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER COORDINATOR

Slide 61

Slide 61 text

23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION FOLLOWER COORDINATOR

Slide 62

Slide 62 text

23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION FOLLOWER COORDINATOR Crash Segmentation fault

Slide 63

Slide 63 text

23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash Segmentation fault Failover Elect new leader after the crash to continue

Slide 64

Slide 64 text

23 DISRUPTOR APPLICATION LEADER APPLICATION FOLLOWER APPLICATION LEADER COORDINATOR Crash Segmentation fault Divergence Coalesce events to execute over new code Failover Elect new leader after the crash to continue

Slide 65

Slide 65 text

0 0.5 1 1.5 2 1 2 3 4 5 6 7 8 9 10 performance overhead Taken on 3.40 GHz Intel Core i7-2600 with 8 GB of RAM, Linux kernel 3.11.0 Measured using redis-benchmark 24 Nx ≤30% overhead on Redis benchmark increases sub-linearly with number of versions

Slide 66

Slide 66 text

Support for more complex code changes: Data structure inference & excavation Control flow graph isomorphisms Call stack reconstruction Support for non-crashing type of divergences: Infinite loops and deadlocks 25 Future Work

Slide 67

Slide 67 text

Novel approach for improving software updates: Multi-version execution based approach Relying on abundance of resources to improve reliability Run the new version in parallel with the existing one Synchronise the execution of the versions Use output of correctly executing version 26 Summary

Slide 68

Slide 68 text

Distinct code bases, manually-generated N-version programming: A fault-tolerance approach to reliability of software operation Chen, L., and Avizienis, A. FTCS’78 Using replicated execution for a more secure and reliable web browser Xue, H., Dautenhahn, N., and King, S. T. NDSS’12 Variants of the same code, automatically generated N-variant systems: a secretless framework for security through diversity Cox, B., Evans, D., Filipi, A., Rowanhill, J., Hu, W., Davidson, J., Knight, J., Nguyen-Tuong, A., and Hiser, J. USENIX Security’06 Run-time defense against code injection attacks using replicated execution Salamat, B., Jackson, T., Wagner, G., Wimmer, C., and Franz, M. IEEE Transactions 2011 Online validation of different manually-evolved versions Efficient online validation with delta execution Tucek, J., Xiong, W., Zhou, Y. ASPLOS’09 Tachyon: Tandem Execution for Efficient Live Patch Testing Maurer, M., Brumley, D. USENIX Security’12 27