Using LLVM for malware deobfuscation

1 Title WIP Presentation Using LLVM for malware deobfuscation B1
Yuma Kurogome(@ntddk) a.k.a. gomachan Supervisor: none

2 Contents ▪Background ▪Purpose ▪Related work ▪Approach ▪Implementation ▪Problem ▪Future
work

3 Background ▪Analysis of malware is becoming difficult  APT
 Botnet  Code obfuscation etc... ▪Many obfuscation tools/methods ▪No good deobfuscation tool available

4 Purpose ▪Realization of useful deobfuscator  Use code optimizer
of LLVM  Implementation of x86 Frontend ➔ It is difficult to make AST from x86 native code x86 Frontend x86

5 Related work OptiCode: Machine Code Deobfuscation for Malware Analysis,
Nguyen Anh Quynh, Presentation, SysCan SG, Apr 2013 ▪Support many obfuscation technics  Insert dead instruction  Insert NOP semantic instructions  Insert unreachable code  Insert branch insn to next insn ▪Own x86 frontend(details unknown) and default LLVM optimizer  Generate control flow graph(CFG) consisting of basic blocks(BB) from machine code  Constant folding  Eliminate dead store instrucitons  Combine instrctions  Simplifly CFG  Merge BB In this work, I wanted to reproduce the OptiCode

6 Related work Dynamically Translating x86 to LLVM using QEMU,
Vitaly Chipounov, George Candea, 2010 ▪QEMU has Dynamic translator(now Tiny code generator)  Target code → IR → host code  Disassembler  Micro-Operations  Mapping ▪Use LLVM Code Dictionary instead of Host Code Dictionary  Reffered when mapping

7 Approach 1.Read obfuscated code 2.Dynamic translation 3.LLVM bitcode 4.Generate
BB and CFG 5.Optimize 6.Generate deobfuscated code

8 Implementation ▪Modify QEMU Dynamic Translator  Tiny code generator(tcg)
➔ BB  Easy to mapping register of LLVM IR  Generate CFG from LLVMContext class ▪Use LLVM optimizer  Insert dead code ➔ -dse, -simplifycfg  Substitute with equivalent instructions ➔ -constprop, -instcombie  Reorder instructions ➔ -instcombie

9 Problem ▪Methods written in Opticode can be deobfuscated 
Without opaque predicate However, ▪QEMU Dynamic translator has problems  Dependence on context  Impossible to interpret Win32API  Overhead ▪Optimice is more sophisticated than my work  Deobfuscation plugin for IDA  Use CFG and BB generated from IDA  Overcome the problem of my work ▪Evaluation method is ambiguous...

10 Future work ▪Continuation of research for TERM  How
can we deobfuscate malware? ▪Establishment of evaluation method ▪Leading in semantics  Abstract lnterpretation  Predicate logic  There is little existing reserch...

Using LLVM for malware deobfuscation

Using LLVM for malware deobfuscation

Yuma Kurogome

More Decks by Yuma Kurogome

Other Decks in Programming

Featured

Transcript

1 Title WIP Presentation Using LLVM for malware deobfuscation B1

2 Contents ▪Background ▪Purpose ▪Related work ▪Approach ▪Implementation ▪Problem ▪Future

3 Background ▪Analysis of malware is becoming difficult  APT

4 Purpose ▪Realization of useful deobfuscator  Use code optimizer

5 Related work OptiCode: Machine Code Deobfuscation for Malware Analysis,

6 Related work Dynamically Translating x86 to LLVM using QEMU,

7 Approach 1.Read obfuscated code 2.Dynamic translation 3.LLVM bitcode 4.Generate

8 Implementation ▪Modify QEMU Dynamic Translator  Tiny code generator(tcg)

9 Problem ▪Methods written in Opticode can be deobfuscated 

10 Future work ▪Continuation of research for TERM  How