Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IR Transformation Synthesis for Assembly Instructions

IR Transformation Synthesis for Assembly Instructions

Ekoparty 11 (2015)

Sebastian N. Fernandez

October 23, 2015
Tweet

Other Decks in Research

Transcript

  1. Intermediate Representation: What is it? “An abstract machine language designed

    to aid in the analysis of computer programs” - Wikipedia
  2. Intermediate Representation: What is it? •A group of reduced semantics

    that combined together can represent (most) assembly instructions •Reverse Engineering: REIL (BinNavi), BIL (BAP), ESIL (Radare) •Tooling: VEX (Valgrind), Tiny Code (QEMU) •Compilers: LLVM IR, SPIR-V, GCC uses many IRs in different stages
  3. Intermediate Representation in RE: Why? •Same IR for all the

    architectures (tool portability) •Hard to reason directly on assembly languages because of the amount of opcodes •IR defines the semantics of the expression instead of the syntax so better for tooling
  4. IR Transformation: What is it? An algorithm to build a

    list of statements that, using instructions or expressions, has equivalent semantics to the native instruction
  5. IR Transformation (Example) ; ; asm: ret ; data (1):

    c3 ; 00401006.00 LDM R_ESP:32, , V_01:32 00401006.01 ADD R_ESP:32, 4:32, R_ESP:32 00401006.02 JCC 1:1, , V_01:32
  6. IR Transformation (Example) ; asm: test eax, eax ; data

    (2): 85 c0 ; 00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1
  7. Why Automatic •Writing transformations is repetitive and error prone (and

    boring) •Why not? • Lots of stuff learned while working on this project • It always feels good when someone else does the work for you, even if you had to code it from scratch
  8. Objective •Automatize the process as much as possible •Don’t require

    much information •Validate the results obtained somehow
  9. Architecture •Definition of the target processor architecture •Agent executing in

    the target processor •“Interesting” values sampler •IR Emulator •Synthesis engine •SMT Solver
  10. Agent •Observable agent for the target architecture • Fork +

    self debugger running in the target architecture where we can observe the behavior of an instruction with a specified input
  11. Arch Information •Definition file with: • Registers with width and

    most occurring use of them • GPR, Floating point, Vector, Flags, Instruction Pointer • Sub registers • Endianness
  12. Arch Information •Assembler/Disassembler • Widely available now, more than one

    and tested against big code bases • Useful for associating modified registers to syntactic representation
  13. Intermediate Representation •It’s non-sense to define another IR but we

    are writing a lightweight emulator so we define another one based on Expression trees
  14. Intermediate Representation •Arithmetic operations are typed with width and representation

    • Integer+width, IEEE754_32 (float) IEEE754_64 (double) • Add, Sub, Mul, Mod, Left shift, Right shift, SDiv, SMod •Logic operations • And, Xor, Or, Left shift, Right shift •Compare • LargerThan, LargerOrEqual, SignedLargerThan, SignedLargerEqual, Equal, NotEqual • WIP: IEEE754 comparisons •Cast • Signed and unsigned extension •Syntax sugar for most widely used semantics
  15. Dependency Graph •Dependency in assembly instructions is hard • Instructions

    have side effects or implicit arguments • MOVS(x86): uses EDI, ESI • Every stack operation uses the stack pointers implicitly • Most arithmetic instructions update the EFLAGS register •This step bounds the state analysis for the following steps • Synthesizing on bounded registers and memory regions is easier than on full system
  16. Dependency Graph •Approximation: running all the possible system status would

    be too expensive •Pre-define a set of inputs to maximize the output change •How: • Pre define a number of rounds we will run to get the outputs • Prepare an agent with the registers with a specific input set • Observe changes in agent • Bound the changes to a set of registers and memory regions • Repeat until we exhaust our testing set
  17. Syntactic Association •Once we have the dependencies, how are they

    associated with the disassembled text •Detect the explicit arguments •Classify the dependencies in arguments when it's possible
  18. I/O Set Generation •Pretty much like Dependency Graph finder but

    with a more bounded state •Approximation for creating useful I/O sets for each dependency •We have to consider that instructions can not only copy but make logic and arithmetic (integer and floating point) operations
  19. Transformation Synthesis •Three approaches • Template based synthesis • Stochastic

    search synthesis • Mixed •Expression based • All the transformations are a list of STORE(Expr, Expr) • Memory access are encoded in expressions too
  20. Template based synthesis •Most efficient approach • The search is

    bounded to a very small space • Enumeration of this space is deterministic so we don't miss cases • We can pre-define templates for specific cases • For example flags in assembly languages, tend to always change due to the same circumstances • Zero flag always set when instruction result is equal to zero • Overflow flag set when an arithmetic instruction overflows • ...
  21. Stochastic Search •Search space is huge • Seriously! •Use Monte

    Carlo Markov Chain sampling to try reaching the solution • Start with a blank program • Propose modifications to the current program • Accept or reject it according to a cost function and adopt it as current • Repeat
  22. MCMC •Transform a cost function into a probability density function

    • Our cost functions depends on how close are the outputs of the proposed programs to the desired outputs for each input • Always accept transformations that lower the cost but might accept ones that increase it slighter • Acceptance: ∗; < ; − log •Actions on the Expression Tree sampled randomly • Insert Expression • Replace Expression • Remove Expression • Replace Operation
  23. Cost Function •Hamming distance from output to desired output •Integer

    distance from output to desired output •Penalty for tree size •Penalty for not using all the dependencies •Infinite cost for expressions that return error (divide by 0)
  24. Mixed search •Best of both worlds •Works good with vector

    instructions •Use a template as base and set the expression that can mutate
  25. Contrast Verification •Sometimes we synthetize many programs for the same

    I/O set •The IR can be translated to SMT formulas • Bit Vector logic • IEEE754/Floating Point logic •Formulas are sent to a SMT solver to check if two programs are equivalent • If these are not, generate the counter example (input for which programs’ output differ) • With the new input, go back to the I/O generation and discard one or many programs
  26. Contrast Verification (example) (declare-const reg1 (_ BitVec 32)) (assert (not

    (= (bvxor reg1 (bvxor (bvnot reg1) (bvneg reg1))) (bvsub reg1 (_ bv1 32)) ))) (check-sat) (get-model)
  27. Finish •We have now discarded all the programs that are

    not equivalent •Choose the shortest program
  28. Results •We can approximate most of the instructions used in

    a x86 program automatically •Most transformations are finished in a minute or less • Emulator+Sampler works at 50-80k expressions tree / sec • Templates helps to avoid computation in common semantics (EFLAGS for example) •Some expressions are correct but not optimized • Div/Mul 1, Add/Sub 0, Bits(0, width)... • A simple pass trying to remove as much element from the tree while maintaining the cost is run • Writing an arithmetic/logic optimization pass would be a good idea too
  29. Limitations •This approach is by definition an approximation as most

    steps are worked with bounded CPU state •The effort is focused in user space instructions •Instructions with very complex semantics won’t be synthetized • Examples: syscall, int, sqrt, Intel AES extension •No way to detect when registers are undefined • We observe changes and undefined behavior might change among different processors of the same architecture •No way to detect memory locking •Loops not supported (x86 REP prefix is out of the question)
  30. ToDo •Finish IEEE754 (float) support •Port agent to other architectures

    (simple task if target supports ptrace) • ARM and MIPS planned as hardware is easy to get •Use the agent to reproduce a full program and simultaneously execute it with our semantics to check if all the transformations are outputting the same result as native code •Optimizations • During dependency discovery, we already generate many usable I/O sets that can be included directly for later stages
  31. Information •Source will be available at https://github.com/snf/synthir •More information about

    the techniques this tool is based on • Automated Synthesis of Symbolic Instructions Encodings: http://research.microsoft.com/pubs/156020/main.pdf • Stochastic Superoptimization: http://cs.stanford.edu/people/eschkufz/research/asplos291-schkufza.pdf