Slide 1

Slide 1 text

Tomasz  Kuchta   Imperial  College  London   Imperial  College  ACM  Student  Chapter  Seminar,  7th  March  2014  

Slide 2

Slide 2 text

      2   Name:  Tomasz  Kuchta  (hDp://www.doc.ic.ac.uk/~tk2512/  )   From:  Kraków  (Cracow),  Poland   Before:    MSc  in  Computer  Science  (Cracow  University  of  Technology)    Work  as  a  soUware  engineer  (telecommunicaVons)   Interests:    Music  (hDps://soundcloud.com/gitaronek  )    Photography  (hDp://www.flickr.com/photos/_tomek_/  )  

Slide 3

Slide 3 text

Problem  overview   Symbolic  execuVon   Overview   Basic  definiVons   Concolic  execuVon   Document  Recovery   Proposed  soluVon   Challenges     3  

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

The  user  is  unable  to  read  /  edit  broken     documents  since  they  cause  abnormal     applicaVon  terminaVon  or  do  not  load     Documents  can  get  corrupted,  be  malformed  or   malicious     Such  bugs  are  highly  user-­‐visible     Bad  input  accounts  for  a  large  number  of  security   vulnerabiliVes     Example:  Pine  –  a  text  mode  e-­‐mail  client   Message  with  a  special  “From:”  field  crashes  the   program         5   From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar

Slide 6

Slide 6 text

  6   int array[4] = {100, 200, 300, 400}; int tmp[4] = {10, 20, 30, 40}; int offset = ; array[offset] = 0; for (int i = 0; i < 4; ++i) { array[i] = array[i] / tmp[i]; } Program   Document  

Slide 7

Slide 7 text

  7   int array[4] = {100, 200, 300, 400}; int tmp[4] = {10, 20, 30, 40}; int offset = ; array[offset] = 0; for (int i = 0; i < 4; ++i) { array[i] = array[i] / tmp[i]; } 100   200   300   400   10   20   30   40   4   Memory   Program   100   200   300   400   0   20   30   40  

Slide 8

Slide 8 text

  8   int array[4] = {100, 200, 300, 400}; int tmp[4] = {10, 20, 30, 40}; int offset = ; array[offset] = 0; for (int i = 0; i < 4; ++i) { array[i] = array[i] / tmp[i]; } 100   200   300   400   10   20   30   40   4   Memory   Program   100   200   300   400   0   20   30   40   possible  buffer  overflow   possible  division  by  zero  

Slide 9

Slide 9 text

Truncate  the  file   Possible  loss  of  user  data   Test  the  file  against  a  specificaVon   Need  to  create  a  specificaVon  for  each  format   What  if  the  “buggy”  file  is  correct?   Try  to  guess  the  right  value   Might  be  hard  for  highly  structured  formats     Or  …     9  

Slide 10

Slide 10 text

Is  it  possible  to  fix  a  malformed     document,  without  assuming  any  input     format,  in  a  way  that  preserves  the     original  content  as  much  as  possible?         10  

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

  12   x, y, z -> symbolic if (x > 5) { if (y = 10) { } } if (z ≠ 20) { } x  >  5   x  ≤  5   y  =  10   y  ≠  10   z  =  20   z  ≠  20   z  =  20   z  ≠  20   z  =  20   z  ≠  20  

Slide 13

Slide 13 text

  13   x, y, z -> symbolic if (x > 5) { if (y = 10) { } } if (z ≠ 20) { } x  >  5   x  ≤  5   y  =  10   y  ≠  10   z  =  20   z  ≠  20   z  =  20   z  ≠  20   z  =  20   z  ≠  20   Path condition: (x > 5) ∧ (y = 10) ∧ (z = 20)

Slide 14

Slide 14 text

  14   x, y, z -> symbolic if (x > 5) { if (y = 10) { } } if (z ≠ 20) { } x  >  5   x  ≤  5   y  =  10   y  ≠  10   z  =  20   z  ≠  20   z  =  20   z  ≠  20   z  =  20   z  ≠  20   Path condition: (x ≤ 5) ∧ (z ≠ 20)

Slide 15

Slide 15 text

Path  condiVon  (PC):  a  conjuncVon  of  constraints   on  symbolic  variables  encountered  on  a  given   execuVon  path   SMT  solver:  a  specialised  version  of  SAT  solver   A  soUware  tool   Answers  the  quesVon  of  sa@sfiability   Returns  a  counterexample  –  a  set  of  values  that   saVsfy  the  constraints     15   Path  condi+on   Possible  counterexample  for  {x,  y,  z}   (x > 5) ∧ (y = 10) ∧ (z = 20) x = 7, y = 10, z = 20 (x ≤ 5) ∧ (z ≠ 20) x = 0, y = 0, z = 0

Slide 16

Slide 16 text

Concolic  execuVon  /  tesVng  is  a  mix  of  concrete   (standard)  execuVon  and  symbolic  execuVon   Use  concrete  values  on  decision  points   Gather  symbolic  constraints   Tackles  the  problem  of  state  explosion  and   reaching  deep  states       16  

Slide 17

Slide 17 text

  17   x, y, z -> symbolic if (x > 5) { if (y = 10) { } } if (z ≠ 20) { } x  >  5   x  ≤  5   y  =  10   y  ≠  10   z  =  20   z  ≠  20   z  =  20   z  ≠  20   z  =  20   z  ≠  20   Path condition: (x > 5) ∧ (y = 10) ∧ (z = 20) Concrete values: x = 7, y = 10, z = 20

Slide 18

Slide 18 text

  This  work  is  supported  by  MicrosoU  Research  through  its  PhD  Scholarship  Programme   A  joint  project  with  Dr  CrisVan  Cadar,  Dr  Miguel  Castro  and  Dr  Manuel  Costa  

Slide 19

Slide 19 text

  19  

Slide 20

Slide 20 text

  20   Collect  alternaVve  execuVon  paths   Explore  alternaVve  paths  in  concolic  manner   Original  input  (document):   x = 5, y = 5, z = 5 Crash  path’s  Path  CondiVon:   (x ≥ 5) ∧ (y ≥ 5) ∧ (z ≥ 5) P3  path’s  Path  CondiVon:   (x ≥ 5) ∧ (y ≥ 5) ∧ (z < 5) New  input  (recovery  candidate):   x = 5, y = 5, z = 0

Slide 21

Slide 21 text

  21   Collect  alternaVve  execuVon  paths   Explore  alternaVve  paths  in  concolic  manner  

Slide 22

Slide 22 text

  22   OpVmisaVons   Using  concolic  execuVon   Postponing  SMT  solver  queries   CollecVng  only  the  last  N  alternaVve  paths   OpVmising  creaVon  of  recovery  candidates   ParVal  symbolic  execuVon   Taint  tracking  to  select  the  bytes  to  treat  as  symbolic        

Slide 23

Slide 23 text

  23   Taint  tracking   int x = Document[1]; int y = Document[2]; int a = x; // {1} int b = y; // {2} if (a > 5) // {1} { int c = b; // {2} or {1,2} int d = a + b; // {1,2} } Document   Data  flow   Data  &  Control  flow  

Slide 24

Slide 24 text

  24   Tested  benchmarks:   pr  –  paginaVon  uVlity  for  text  files   pine  –  text  mode  e-­‐mail  client   dwarfdump  –  display  debug  informaVon  of  binary   files  (tested  on  executable  files)   readelf  –  similar  to  dwarfdump  (tested  on  object   files)  

Slide 25

Slide 25 text

  25     %PDF-1.7 ... 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F2 11 0 R >> >> /Contents 4 0 R >> endobj 4 0 obj % page content << /Length 44 >> stream BT 70 50 TD /F2 12 Tf (Hello, world!) Tj ET endstream endobj 9 0 obj << /Type /ObjStm ... >> stream 11 0 << /Type /Font ... >> endstream endobj 12 0 obj << /Type /XRef ... >> stream 00 0000 FFFF 01 000a 0000 ... endstream endobj startxref 570 %%EOF

Slide 26

Slide 26 text

  26     %PDF-1.7 ... 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F2 11 0 R >> >> /Contents 4 0 R >> endobj 4 0 obj % page content << /Length 44 >> stream BT 70 50 TD /F2 12 Tf (Hello, world!) Tj ET endstream endobj 9 0 obj << /Type /ObjStm ... >> stream -1 0 << /Type /Font ... >> endstream endobj 12 0 obj << /Type /XRef ... >> stream 00 0000 FFFF 01 000a 0000 ... endstream endobj startxref 570 %%EOF

Slide 27

Slide 27 text

[1]  C.  Cadar,  D.  Dunbar,  and  D.  Engler.  KLEE:   Unassisted  and  automa+c  genera+on  of  high-­‐ coverage  tests  for  complex  systems  programs.     In  OSDI’08,  Dec.  2008.     [2]  P.  D.  Marinescu  and  C.  Cadar.  make  test-­‐zes+:  A   symbolic  execu+on  solu+on  for  improving   regression  tes+ng.  In  ICSE’12,  June  2012.     [3]  F.  Long,  V.  Ganesh,  M.  Carbin,  S.  Sidiroglou,  and   M.  Rinard.  Automa+c  input  rec+fica+on.     In  ICSE’12,  June  2012.           27  

Slide 28

Slide 28 text

SoUware  faults  and  “broken”  documents   Symbolic  execuVon  technique   Symbolic  variables   Path  CondiVon  and  SMT  solver   Concolic  execuVon  /  tesVng   Document  recovery  soluVon   Approach  based  on  concolic  execuVon   Various  performance  opVmisaVons   ParVal  symbolic  execuVon  of  the  bytes  selected  by   taint  tracking