"Techniques of binary code analysis", Konstantin Panarin

ptsecurity.ru Techniques of binary code analysis – methods, problems, tools
Konstantin Panarin, low-level application analysis group developer

• Konstantin Panarin, Positive Technologies, [email protected] • Developer of low-level
application analysis group #whoami

• Purposes of binary analysis • Overview of techniques and
arising problems • Overview of modern analysis tools AGENDA

• Error discovering • Vulnerability discovering • Searching for backdoors
and undocumented features • Recovery of program logic(RE) • Tests generation (more details later) Purposes of binary analysis

Specificity of binary analysis • Almost complete lack of type-related
information • Intensive use of obfuscation and antidebugging techniques: CFG flattening, virtual machines, dead code insertion • Complexity of gathering of “meta information” of different kinds (exception handlers) • Semantic difficulty of particular assembler instructions (especially considering x86 instruction set): XLAT, DIV, CMPXCHG and so on

Types of analysis: • Static analysis • No program execution
• Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis

Types of analysis: • Static analysis • No program execution
• Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis In practice, analysis tools use a mixture of different techniques because each instrumentation method has its own restrictions. Consistent use of various approaches helps to partially (and sometimes totally) overcome their limitations.

Static vs dynamic analysis Dynamic analysis: • Availability of run-time
information: process memory map, addresses of indirect calls and so on • Program execution may require specific environment • It’s not always possible to reproduce the results of previously ran analysis Static analysis: • In common, works faster • One analysis run is potentially able to cover infinite number of execution paths • Able to work in the case of absence of some parts of code\libraries • Unable to cope with obfuscation and encryption

• Key idea – replacement of concrete input data (eg
function arguments) by symbolic values • Analysis tool operates on symbolic expressions instead of their concrete counterpart • Symbolic execution is able to cover all execution paths on single run • Every execution path represents a “state” of program, which holds all the constraints on crafted symbolic variables (path and value constraints) • SMT-solver – tool, designed to resolve constraints on symbolic variables Techniques – symbolic execution

int twice(int v) { return 2 * v; } void
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Let’s find the values of x and y, which will force execution flow to reach the label ERROR Taken from http://www.srl.inf.ethz.ch/pa2015/Lecture8.pdf

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Path constraints: True

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: True

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 != 2y0 Create two different states after conditional branch - if (x==z)

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 =2y0 ^ x0 > y0+10 Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 =2y0 ^ x0 <= y0+10

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 ^ x0 > y0+10 Reachability condition of label ERROR:

test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 ^ x0 > y0+10 Reachability condition of label ERROR: SMT Solver gives the following solution: x0 = 40, y0 = 20

• Symbolic execution was originally used for tests generation (in
the 70s of the last century): • Input values of tested program are marked as symbolic • In case of program error\vulnerability at some point in the executable we resolve the reachability constraints to this point • Found conditions on input data form the required test • This scheme is widely used in different verification systems Symbolic execution: applications

• Symbolic execution: general scheme translator into IR assembler instruction
Set of IR instructions Pool of states (one state per execution path) State №1 State №2 State №… State №500 Each state holds the following data: • Current IP (instruction pointer) • Symbolic context (registers, memory cells) • Constraints Executor (director) – processes particular state X86: mov eax, ecx ___________________ IR: STR R_ECX:32, , V_00:32 STR V_00:32, , R_EAX:32 Interpreter – contains handlers for each IR instruction translation Conditional branch with condition X If some “interesting” point is reached, check its reachability: extract path constraints from state and solve corresponding smt task SMT-Solvers: Z3, STP, Boolector New state a: Constraints += X New state b: Constraints += ~X Add new states into the pool Searcher selects state from the pool

Symbolic execution – existing problems • path explosion (how to
generate smaller number of states?) • cycle-unrolling (how to process cycles, the exit condition of which depend on symbolic variable?) • symbolic pointers (how to handle load and store operations, the address of which are also symbolic?) • constraint difficulty (some generated constraints are too difficult for all SMT-solvers to evaluate and find exact solutions) • external resources (how to process file handlers and other references to external objects?)

Symbolic execution – possible solutions • path explosion – merge
(unite) several states into bigger one (but when and how?) • path explosion – simultaneous processing of various states (parallel symbolic execution). • cycle unrolling, symbolic pointers – use specific SMT-logics (how effective is it?) • external resources – use DSL in order to describe external calls in terms of solver‘s expressions

• Purely dynamic analysis method • Connects trace with the
data processed by program in the course of execution • Answers the question of how the program processes certain pieces of input data Analysis of marked data (taint analysis)

Taint analysis: basic idea • Main concepts: shadow memory and
taint propagation. Shadow memory Taint propagation

mov eax, tainted_input xor eax, eax ; eax is UNTAINTED
----------------------------------------- push tainted_input pop eax ; eax is TAINTED, ----------------------------------------------------------------- xor eax, eax cmp eax, tainted_input ; AF, CF, OF, PF, SF, ZF are TAINTED Taint propagation: examples mov eax, tainted _input mov ecx, untainted_input add ecx, eax ; ecx is TAINTED ----------------------------------------- mov eax, tainted_input mov ecx, untainted_input mov ax, cx ; ax is UNTAINTED, eax is TAINTED ----------------------------------------------------------------- Taken from http://defcon.org.ua/data/1/4_Oleksyk_Code_Analysis.pdf

Taint analysis: scheme Program code: ___________________ __ push ebp Mov
ebp, esp lea eax, [esp+8] … ret Runtime analysis of machine instructions add eax, [esp+8] Instruction handler: Syntax parsing, extraction of instruction’s operands, address resolution (for memory operands) Taint context EBX: not tainted Taint propagation ECX: tainted … EDI: tainted EAX: not tainted SHADOW MEMORY Operands: dest - eax, src: eax, 0x7f2300 Context reading: eax – not tainted 0x7f2300 - tainted Context writing: eax – tainted

Taint analysis Usage of taint-analysis: • Tainted EIP suggest the
possibility of control flow hijacking (for example, as a result of stack\heap overflow). • Tainted arguments of particular functions (printf family of functions, system) suggest the possibility of a vulnerability. • Tainted resources (handlers, mutexes, that do not depend directly on use input) suggest the possibility of logical error in the program. Drawbacks: • Requires detailed analysis of every assembler instruction, which may be tedious for some architectures (x86) • Ideal taints analysis should instrument the entire code executed by the operating system (both in user-mode and in kernel-mode) which is not always possible.

Combined analysis aka concolic execution Concrete + symbolic = concolic:
• Create snapshots of the entire process on chosen control points • Instrument concrete execution trace and at the same time fill the queue of symbolic constraints: for each conditional branch on the trace push its constraints into the queue • Roll back to the previous control point, select new symbolic condition from the queue, solve its corresponding SMT-task and substitute found solution into exact execution context (registers and memory areas) • Instrument new trace with new parameters

Existing tools (OpenSource) KLEE • Created as test generation tool
with high coverage • Based on LLVM IR • Uses symbolic execution • Automatic test generation

Existing tools (OpenSource) Triton • Uses concolic execution • Directly
converts asm instructions into solver’s (Z3) expressions (bypassing internal representation) Other: FuzzBall, BitBlaze • There are no tools of proper product quality • Every tool is focused on solving one particular task

Existing tools(ClosedSource) MAYHEM • Designed to automatically search for vulnerabilities
and generate exploits • Able to work with symbolic pointers • Winner of DARPA contest in 2016 CodeSurfer, VeraCode • A little is known about their inner structure

Conclusion • Methods of binary analysis still require a lot
of careful research • At present, there doesn’t exist a universal instrument of binary analysis • Every tool is focused on solving one particular task • Positive Technologies is working on its own tool – STAY TUNED!

Спасибо за внимание! ptsecurity.ru

"Techniques of binary code analysis", Konstanti...

"Techniques of binary code analysis", Konstantin Panarin

OWASP Moscow

More Decks by OWASP Moscow

Other Decks in Technology

Featured

Transcript

ptsecurity.ru Techniques of binary code analysis – methods, problems, tools

• Konstantin Panarin, Positive Technologies, [email protected] • Developer of low-level

• Purposes of binary analysis • Overview of techniques and

• Error discovering • Vulnerability discovering • Searching for backdoors

Specificity of binary analysis • Almost complete lack of type-related

Types of analysis: • Static analysis • No program execution

Types of analysis: • Static analysis • No program execution

Static vs dynamic analysis Dynamic analysis: • Availability of run-time

• Key idea – replacement of concrete input data (eg

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

int twice(int v) { return 2 * v; } void

• Symbolic execution was originally used for tests generation (in

• Symbolic execution: general scheme translator into IR assembler instruction

Symbolic execution – existing problems • path explosion (how to

Symbolic execution – possible solutions • path explosion – merge

• Purely dynamic analysis method • Connects trace with the

Taint analysis: basic idea • Main concepts: shadow memory and

mov eax, tainted_input xor eax, eax ; eax is UNTAINTED

Taint analysis: scheme Program code: _________________ push ebp Mov

Taint analysis Usage of taint-analysis: • Tainted EIP suggest the

Combined analysis aka concolic execution Concrete + symbolic = concolic:

Existing tools (OpenSource) KLEE • Created as test generation tool

Existing tools (OpenSource) Triton • Uses concolic execution • Directly

Existing tools(ClosedSource) MAYHEM • Designed to automatically search for vulnerabilities

Conclusion • Methods of binary analysis still require a lot

Спасибо за внимание! ptsecurity.ru