IR Transformation Synthesis for Assembly Instructions

IR Transformation Synthesis for Assembly Instructions EKOPARTY 2015

About me: Sebastian Fernandez •Security Researcher •Exploit writer (artist) •Frustrated
reverse engineering tool writer

Intermediate Representation: What is it? “An abstract machine language designed
to aid in the analysis of computer programs” - Wikipedia

Intermediate Representation: What is it? •A group of reduced semantics
that combined together can represent (most) assembly instructions •Reverse Engineering: REIL (BinNavi), BIL (BAP), ESIL (Radare) •Tooling: VEX (Valgrind), Tiny Code (QEMU) •Compilers: LLVM IR, SPIR-V, GCC uses many IRs in different stages

Intermediate Representation in RE: Why? •Same IR for all the
architectures (tool portability) •Hard to reason directly on assembly languages because of the amount of opcodes •IR defines the semantics of the expression instead of the syntax so better for tooling

IR Transformation: What is it? An algorithm to build a
list of statements that, using instructions or expressions, has equivalent semantics to the native instruction

IR Transformation (Example) ; ; asm: ret ; data (1):
c3 ; 00401006.00 LDM R_ESP:32, , V_01:32 00401006.01 ADD R_ESP:32, 4:32, R_ESP:32 00401006.02 JCC 1:1, , V_01:32

IR Transformation (Example) ; asm: test eax, eax ; data
(2): 85 c0 ; 00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1

Why Automatic Many thousand of opcodes with different operand combinations
among different architectures

Why Automatic •Writing transformations is repetitive and error prone (and
boring) •Why not? • Lots of stuff learned while working on this project • It always feels good when someone else does the work for you, even if you had to code it from scratch

Objective •Automatize the process as much as possible •Don’t require
much information •Validate the results obtained somehow

Architecture •Definition of the target processor architecture •Agent executing in
the target processor •“Interesting” values sampler •IR Emulator •Synthesis engine •SMT Solver

Agent •Observable agent for the target architecture • Fork +
self debugger running in the target architecture where we can observe the behavior of an instruction with a specified input

Arch Information •Definition file with: • Registers with width and
most occurring use of them • GPR, Floating point, Vector, Flags, Instruction Pointer • Sub registers • Endianness

Arch Information •Assembler/Disassembler • Widely available now, more than one
and tested against big code bases • Useful for associating modified registers to syntactic representation

Intermediate Representation •It’s non-sense to define another IR but we
are writing a lightweight emulator so we define another one based on Expression trees

Intermediate Representation •Arithmetic operations are typed with width and representation
• Integer+width, IEEE754_32 (float) IEEE754_64 (double) • Add, Sub, Mul, Mod, Left shift, Right shift, SDiv, SMod •Logic operations • And, Xor, Or, Left shift, Right shift •Compare • LargerThan, LargerOrEqual, SignedLargerThan, SignedLargerEqual, Equal, NotEqual • WIP: IEEE754 comparisons •Cast • Signed and unsigned extension •Syntax sugar for most widely used semantics

IR Emulator

Automatic Synthesis •Stages

Dependency Graph

Dependency Graph •Dependency in assembly instructions is hard • Instructions
have side effects or implicit arguments • MOVS(x86): uses EDI, ESI • Every stack operation uses the stack pointers implicitly • Most arithmetic instructions update the EFLAGS register •This step bounds the state analysis for the following steps • Synthesizing on bounded registers and memory regions is easier than on full system

Dependency Graph •Approximation: running all the possible system status would
be too expensive •Pre-define a set of inputs to maximize the output change •How: • Pre define a number of rounds we will run to get the outputs • Prepare an agent with the registers with a specific input set • Observe changes in agent • Bound the changes to a set of registers and memory regions • Repeat until we exhaust our testing set

Dependency Graph

Dependency Graph •Simple example x86

Dependency Graph •Less simple example x86

Syntactic Association •Once we have the dependencies, how are they
associated with the disassembled text •Detect the explicit arguments •Classify the dependencies in arguments when it's possible

I/O Set Generation

I/O Set Generation •Pretty much like Dependency Graph finder but
with a more bounded state •Approximation for creating useful I/O sets for each dependency •We have to consider that instructions can not only copy but make logic and arithmetic (integer and floating point) operations

Transformation Synthesis

Transformation Synthesis •Three approaches • Template based synthesis • Stochastic
search synthesis • Mixed •Expression based • All the transformations are a list of STORE(Expr, Expr) • Memory access are encoded in expressions too

Template based synthesis •Most efficient approach • The search is
bounded to a very small space • Enumeration of this space is deterministic so we don't miss cases • We can pre-define templates for specific cases • For example flags in assembly languages, tend to always change due to the same circumstances • Zero flag always set when instruction result is equal to zero • Overflow flag set when an arithmetic instruction overflows • ...

Template based synthesis

Stochastic Search •Search space is huge • Seriously! •Use Monte
Carlo Markov Chain sampling to try reaching the solution • Start with a blank program • Propose modifications to the current program • Accept or reject it according to a cost function and adopt it as current • Repeat

MCMC •Transform a cost function into a probability density function
• Our cost functions depends on how close are the outputs of the proposed programs to the desired outputs for each input • Always accept transformations that lower the cost but might accept ones that increase it slighter • Acceptance: ∗; < ; − log •Actions on the Expression Tree sampled randomly • Insert Expression • Replace Expression • Remove Expression • Replace Operation

Cost Function •Hamming distance from output to desired output •Integer
distance from output to desired output •Penalty for tree size •Penalty for not using all the dependencies •Infinite cost for expressions that return error (divide by 0)

Stochastic search (example)

Mixed search •Best of both worlds •Works good with vector
instructions •Use a template as base and set the expression that can mutate

Mixed synthesis

Contrast Verification

Contrast Verification •Sometimes we synthetize many programs for the same
I/O set •The IR can be translated to SMT formulas • Bit Vector logic • IEEE754/Floating Point logic •Formulas are sent to a SMT solver to check if two programs are equivalent • If these are not, generate the counter example (input for which programs’ output differ) • With the new input, go back to the I/O generation and discard one or many programs

Contrast Verification (example) (declare-const reg1 (_ BitVec 32)) (assert (not
(= (bvxor reg1 (bvxor (bvnot reg1) (bvneg reg1))) (bvsub reg1 (_ bv1 32)) ))) (check-sat) (get-model)

Finish •We have now discarded all the programs that are
not equivalent •Choose the shortest program

Finish (example)

Results •We can approximate most of the instructions used in
a x86 program automatically •Most transformations are finished in a minute or less • Emulator+Sampler works at 50-80k expressions tree / sec • Templates helps to avoid computation in common semantics (EFLAGS for example) •Some expressions are correct but not optimized • Div/Mul 1, Add/Sub 0, Bits(0, width)... • A simple pass trying to remove as much element from the tree while maintaining the cost is run • Writing an arithmetic/logic optimization pass would be a good idea too

Limitations •This approach is by definition an approximation as most
steps are worked with bounded CPU state •The effort is focused in user space instructions •Instructions with very complex semantics won’t be synthetized • Examples: syscall, int, sqrt, Intel AES extension •No way to detect when registers are undefined • We observe changes and undefined behavior might change among different processors of the same architecture •No way to detect memory locking •Loops not supported (x86 REP prefix is out of the question)

ToDo •Finish IEEE754 (float) support •Port agent to other architectures
(simple task if target supports ptrace) • ARM and MIPS planned as hardware is easy to get •Use the agent to reproduce a full program and simultaneously execute it with our semantics to check if all the transformations are outputting the same result as native code •Optimizations • During dependency discovery, we already generate many usable I/O sets that can be included directly for later stages

Information •Source will be available at https://github.com/snf/synthir •More information about
the techniques this tool is based on • Automated Synthesis of Symbolic Instructions Encodings: http://research.microsoft.com/pubs/156020/main.pdf • Stochastic Superoptimization: http://cs.stanford.edu/people/eschkufz/research/asplos291-schkufza.pdf

Questions •???

Thank you •Contact: •[email protected]

IR Transformation Synthesis for Assembly Instru...

IR Transformation Synthesis for Assembly Instructions

Other Decks in Research

Featured

Transcript