Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Techniques of binary code analysis", Konstanti...

OWASP Moscow
December 04, 2017

"Techniques of binary code analysis", Konstantin Panarin

OWASP Russia Meetup #6

OWASP Moscow

December 04, 2017
Tweet

More Decks by OWASP Moscow

Other Decks in Technology

Transcript

  1. ptsecurity.ru Techniques of binary code analysis – methods, problems, tools

    Konstantin Panarin, low-level application analysis group developer
  2. • Purposes of binary analysis • Overview of techniques and

    arising problems • Overview of modern analysis tools AGENDA
  3. • Error discovering • Vulnerability discovering • Searching for backdoors

    and undocumented features • Recovery of program logic(RE) • Tests generation (more details later) Purposes of binary analysis
  4. Specificity of binary analysis • Almost complete lack of type-related

    information • Intensive use of obfuscation and antidebugging techniques: CFG flattening, virtual machines, dead code insertion • Complexity of gathering of “meta information” of different kinds (exception handlers) • Semantic difficulty of particular assembler instructions (especially considering x86 instruction set): XLAT, DIV, CMPXCHG and so on
  5. Types of analysis: • Static analysis • No program execution

    • Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis
  6. Types of analysis: • Static analysis • No program execution

    • Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis In practice, analysis tools use a mixture of different techniques because each instrumentation method has its own restrictions. Consistent use of various approaches helps to partially (and sometimes totally) overcome their limitations.
  7. Static vs dynamic analysis Dynamic analysis: • Availability of run-time

    information: process memory map, addresses of indirect calls and so on • Program execution may require specific environment • It’s not always possible to reproduce the results of previously ran analysis Static analysis: • In common, works faster • One analysis run is potentially able to cover infinite number of execution paths • Able to work in the case of absence of some parts of code\libraries • Unable to cope with obfuscation and encryption
  8. • Key idea – replacement of concrete input data (eg

    function arguments) by symbolic values • Analysis tool operates on symbolic expressions instead of their concrete counterpart • Symbolic execution is able to cover all execution paths on single run • Every execution path represents a “state” of program, which holds all the constraints on crafted symbolic variables (path and value constraints) • SMT-solver – tool, designed to resolve constraints on symbolic variables Techniques – symbolic execution
  9. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Let’s find the values of x and y, which will force execution flow to reach the label ERROR Taken from http://www.srl.inf.ethz.ch/pa2015/Lecture8.pdf
  10. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Path constraints: True
  11. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: True
  12. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 != 2y0 Create two different states after conditional branch - if (x==z)
  13. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 =2y0 ^ x0 > y0+10 Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 =2y0 ^ x0 <= y0+10
  14. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 ^ x0 > y0+10 Reachability condition of label ERROR:
  15. int twice(int v) { return 2 * v; } void

    test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 ^ x0 > y0+10 Reachability condition of label ERROR: SMT Solver gives the following solution: x0 = 40, y0 = 20
  16. • Symbolic execution was originally used for tests generation (in

    the 70s of the last century): • Input values of tested program are marked as symbolic • In case of program error\vulnerability at some point in the executable we resolve the reachability constraints to this point • Found conditions on input data form the required test • This scheme is widely used in different verification systems Symbolic execution: applications
  17. • Symbolic execution: general scheme translator into IR assembler instruction

    Set of IR instructions Pool of states (one state per execution path) State №1 State №2 State №… State №500 Each state holds the following data: • Current IP (instruction pointer) • Symbolic context (registers, memory cells) • Constraints Executor (director) – processes particular state X86: mov eax, ecx ___________________ IR: STR R_ECX:32, , V_00:32 STR V_00:32, , R_EAX:32 Interpreter – contains handlers for each IR instruction translation Conditional branch with condition X If some “interesting” point is reached, check its reachability: extract path constraints from state and solve corresponding smt task SMT-Solvers: Z3, STP, Boolector New state a: Constraints += X New state b: Constraints += ~X Add new states into the pool Searcher selects state from the pool
  18. Symbolic execution – existing problems • path explosion (how to

    generate smaller number of states?) • cycle-unrolling (how to process cycles, the exit condition of which depend on symbolic variable?) • symbolic pointers (how to handle load and store operations, the address of which are also symbolic?) • constraint difficulty (some generated constraints are too difficult for all SMT-solvers to evaluate and find exact solutions) • external resources (how to process file handlers and other references to external objects?)
  19. Symbolic execution – possible solutions • path explosion – merge

    (unite) several states into bigger one (but when and how?) • path explosion – simultaneous processing of various states (parallel symbolic execution). • cycle unrolling, symbolic pointers – use specific SMT-logics (how effective is it?) • external resources – use DSL in order to describe external calls in terms of solver‘s expressions
  20. • Purely dynamic analysis method • Connects trace with the

    data processed by program in the course of execution • Answers the question of how the program processes certain pieces of input data Analysis of marked data (taint analysis)
  21. Taint analysis: basic idea • Main concepts: shadow memory and

    taint propagation. Shadow memory Taint propagation
  22. mov eax, tainted_input xor eax, eax ; eax is UNTAINTED

    ----------------------------------------- push tainted_input pop eax ; eax is TAINTED, ----------------------------------------------------------------- xor eax, eax cmp eax, tainted_input ; AF, CF, OF, PF, SF, ZF are TAINTED Taint propagation: examples mov eax, tainted _input mov ecx, untainted_input add ecx, eax ; ecx is TAINTED ----------------------------------------- mov eax, tainted_input mov ecx, untainted_input mov ax, cx ; ax is UNTAINTED, eax is TAINTED ----------------------------------------------------------------- Taken from http://defcon.org.ua/data/1/4_Oleksyk_Code_Analysis.pdf
  23. Taint analysis: scheme Program code: ___________________ __ push ebp Mov

    ebp, esp lea eax, [esp+8] … ret Runtime analysis of machine instructions add eax, [esp+8] Instruction handler: Syntax parsing, extraction of instruction’s operands, address resolution (for memory operands) Taint context EBX: not tainted Taint propagation ECX: tainted … EDI: tainted EAX: not tainted SHADOW MEMORY Operands: dest - eax, src: eax, 0x7f2300 Context reading: eax – not tainted 0x7f2300 - tainted Context writing: eax – tainted
  24. Taint analysis Usage of taint-analysis: • Tainted EIP suggest the

    possibility of control flow hijacking (for example, as a result of stack\heap overflow). • Tainted arguments of particular functions (printf family of functions, system) suggest the possibility of a vulnerability. • Tainted resources (handlers, mutexes, that do not depend directly on use input) suggest the possibility of logical error in the program. Drawbacks: • Requires detailed analysis of every assembler instruction, which may be tedious for some architectures (x86) • Ideal taints analysis should instrument the entire code executed by the operating system (both in user-mode and in kernel-mode) which is not always possible.
  25. Combined analysis aka concolic execution Concrete + symbolic = concolic:

    • Create snapshots of the entire process on chosen control points • Instrument concrete execution trace and at the same time fill the queue of symbolic constraints: for each conditional branch on the trace push its constraints into the queue • Roll back to the previous control point, select new symbolic condition from the queue, solve its corresponding SMT-task and substitute found solution into exact execution context (registers and memory areas) • Instrument new trace with new parameters
  26. Existing tools (OpenSource) KLEE • Created as test generation tool

    with high coverage • Based on LLVM IR • Uses symbolic execution • Automatic test generation
  27. Existing tools (OpenSource) Triton • Uses concolic execution • Directly

    converts asm instructions into solver’s (Z3) expressions (bypassing internal representation) Other: FuzzBall, BitBlaze • There are no tools of proper product quality • Every tool is focused on solving one particular task
  28. Existing tools(ClosedSource) MAYHEM • Designed to automatically search for vulnerabilities

    and generate exploits • Able to work with symbolic pointers • Winner of DARPA contest in 2016 CodeSurfer, VeraCode • A little is known about their inner structure
  29. Conclusion • Methods of binary analysis still require a lot

    of careful research • At present, there doesn’t exist a universal instrument of binary analysis • Every tool is focused on solving one particular task • Positive Technologies is working on its own tool – STAY TUNED!