Using LLVM for malware deobfuscation

Using LLVM for malware deobfuscation


Yuma Kurogome

January 09, 2014


  1. 1 Title WIP Presentation Using LLVM for malware deobfuscation B1

    Yuma Kurogome(@ntddk) a.k.a. gomachan Supervisor: none
  2. 2 Contents ▪Background ▪Purpose ▪Related work ▪Approach ▪Implementation ▪Problem ▪Future

  3. 3 Background ▪Analysis of malware is becoming difficult  APT

     Botnet  Code obfuscation etc... ▪Many obfuscation tools/methods ▪No good deobfuscation tool available
  4. 4 Purpose ▪Realization of useful deobfuscator  Use code optimizer

    of LLVM  Implementation of x86 Frontend ➔ It is difficult to make AST from x86 native code x86 Frontend x86
  5. 5 Related work OptiCode: Machine Code Deobfuscation for Malware Analysis,

    Nguyen Anh Quynh, Presentation, SysCan SG, Apr 2013 ▪Support many obfuscation technics  Insert dead instruction  Insert NOP semantic instructions  Insert unreachable code  Insert branch insn to next insn ▪Own x86 frontend(details unknown) and default LLVM optimizer  Generate control flow graph(CFG) consisting of basic blocks(BB) from machine code  Constant folding  Eliminate dead store instrucitons  Combine instrctions  Simplifly CFG  Merge BB In this work, I wanted to reproduce the OptiCode
  6. 6 Related work Dynamically Translating x86 to LLVM using QEMU,

    Vitaly Chipounov, George Candea, 2010 ▪QEMU has Dynamic translator(now Tiny code generator)  Target code → IR → host code  Disassembler  Micro-Operations  Mapping ▪Use LLVM Code Dictionary instead of Host Code Dictionary  Reffered when mapping
  7. 7 Approach 1.Read obfuscated code 2.Dynamic translation 3.LLVM bitcode 4.Generate

    BB and CFG 5.Optimize 6.Generate deobfuscated code
  8. 8 Implementation ▪Modify QEMU Dynamic Translator  Tiny code generator(tcg)

    ➔ BB  Easy to mapping register of LLVM IR  Generate CFG from LLVMContext class ▪Use LLVM optimizer  Insert dead code ➔ -dse, -simplifycfg  Substitute with equivalent instructions ➔ -constprop, -instcombie  Reorder instructions ➔ -instcombie
  9. 9 Problem ▪Methods written in Opticode can be deobfuscated 

    Without opaque predicate However, ▪QEMU Dynamic translator has problems  Dependence on context  Impossible to interpret Win32API  Overhead ▪Optimice is more sophisticated than my work  Deobfuscation plugin for IDA  Use CFG and BB generated from IDA  Overcome the problem of my work ▪Evaluation method is ambiguous...
  10. 10 Future work ▪Continuation of research for TERM  How

    can we deobfuscate malware? ▪Establishment of evaluation method ▪Leading in semantics  Abstract lnterpretation  Predicate logic  There is little existing reserch...