Slide 1

Slide 1 text

WIP Presentation B1 Yuma Kurogome(@ntddk) a.k.a. gomachan Supervisor: Pending...

Slide 2

Slide 2 text

Theme Using LLVM for Malware Deobfuscation

Slide 3

Slide 3 text

Background ● Analysis of malware is becoming difficult – APT – Kernel rootkit – Code obfuscation ● Many obsuscation tools/methods ● No good deobfuscation tool available

Slide 4

Slide 4 text

LLVM ● Compiler infrastructure written in C++ ● Has many ways to optimize code ● Frontend → Middlend → Backend

Slide 5

Slide 5 text

LLVM ● Frontend – Generate parse tree – Generate LLVM IR ● Middlend – Optimize LLVM IR – Many methods available ● Backend – Generate x86 code from optimized LLVM IR ● If there are x86 frontend...

Slide 6

Slide 6 text

Related works(1/5) Dagger: decompilation to LLVM IR, Ahmed Bougacha, Presentation, 2013 European LLVM Conference, Apr 2013 ● LLVM IR Decompiler ● Focus on semantic gaps between x86 and LLVM IR – LLVM IR designed Static Single Assignment form ● Binary → Mir(own IR) → LLVM IR ● Virtual Operand Expansion

Slide 7

Slide 7 text

Related works(1/5) ● Still private...

Slide 8

Slide 8 text

Related works(2/5) OptiCode: Machine Code Deobfuscation for Malware Analysis, Nguyen Anh Quynh, Presentation, SysCan SG, Apr 2013 ● Same motivation ● Support many obfuscation technics – Insert dead instruction – Insert NOP semantic instructions – Insert unreachable code – Insert branch insn to next insn

Slide 9

Slide 9 text

Related works(2/5) ● Own x86 frontend(details unknown) and default LLVM optimizer – Generate control flow graph(CFG) consisting of basic blocks(BB) from machine code – Constant folding – Eliminate dead store instrucitons – Combine instrctions – Simplifly CFG – Merge BB

Slide 10

Slide 10 text

Related works(2/5) ● Opaque predicate – LLVM cannot deal with – Insert value everytime became true/false ● Theorem Prover(SMT solver) – Prove the satisability/validity of a logical formula – Can generate the model if satisable ● Genarete logical formula from LLVM IR

Slide 11

Slide 11 text

Related works(3/5) KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, Cristian Cadar, Daniel Dunbar, Dawson Engler, OSDI 2008 ● Souce → LLVM bitcode ● Branch recording – Obtain current path condition using Theorem Prover(STP solver) – Execute every branch

Slide 12

Slide 12 text

Related works(4/5) QEMU, a Fast and Portable Dynamic Translator, Fabrice Bellard, USENIX, 2005 ● Dynamic binary translation ● Dynamic translator(now Tiny code generator) – Target code → IR → host code – Disassembler – Micro-Operations – Mapping ● Similar to LLVM :)

Slide 13

Slide 13 text

Related works(5/5) Dynamically Translating x86 to LLVM using QEMU, Vitaly Chipounov, George Candea, 2010 ● QEMU backend for LLVM ≒ x86 frontend for LLVM ● LLVM Code Dictionary instead of Host Code Dictionary – Referred when mapping

Slide 14

Slide 14 text

Approach ● Previously I thought using IDA disassembler... ● Full-scratch x86 frontend aborted... ● Modify QEMU binary translator ● Generate LLVM IR ● Optimize! ● Symbolic execution?

Slide 15

Slide 15 text

Schedule ● November – Research ● December – Implementation(QEMU-based) ● January – Evaluation

Slide 16

Slide 16 text

Thank you!