Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using LLVM for malware deobfuscation

Using LLVM for malware deobfuscation

Yuma Kurogome

January 09, 2014
Tweet

More Decks by Yuma Kurogome

Other Decks in Programming

Transcript

  1. 1
    Title
    WIP Presentation
    Using LLVM for malware
    deobfuscation
    B1
    Yuma Kurogome(@ntddk)
    a.k.a. gomachan
    Supervisor: none

    View Slide

  2. 2
    Contents
    ■Background
    ■Purpose
    ■Related work
    ■Approach
    ■Implementation
    ■Problem
    ■Future work

    View Slide

  3. 3
    Background
    ■Analysis of malware is becoming difficult

    APT

    Botnet

    Code obfuscation
    etc...
    ■Many obfuscation tools/methods
    ■No good deobfuscation tool available

    View Slide

  4. 4
    Purpose
    ■Realization of useful deobfuscator

    Use code optimizer of LLVM

    Implementation of x86 Frontend

    It is difficult to make AST from x86 native code
    x86 Frontend
    x86

    View Slide

  5. 5
    Related work
    OptiCode: Machine Code Deobfuscation for Malware Analysis,
    Nguyen Anh Quynh, Presentation, SysCan SG, Apr 2013
    ■Support many obfuscation technics

    Insert dead instruction

    Insert NOP semantic instructions

    Insert unreachable code

    Insert branch insn to next insn
    ■Own x86 frontend(details unknown) and default
    LLVM optimizer

    Generate control flow graph(CFG) consisting of basic
    blocks(BB) from machine code

    Constant folding

    Eliminate dead store instrucitons

    Combine instrctions

    Simplifly CFG

    Merge BB
    In this work, I wanted to reproduce
    the OptiCode

    View Slide

  6. 6
    Related work
    Dynamically Translating x86 to LLVM using QEMU, Vitaly
    Chipounov, George Candea, 2010
    ■QEMU has Dynamic translator(now Tiny code
    generator)

    Target code → IR → host code

    Disassembler

    Micro-Operations

    Mapping
    ■Use LLVM Code Dictionary instead of Host Code
    Dictionary

    Reffered when mapping

    View Slide

  7. 7
    Approach
    1.Read obfuscated code
    2.Dynamic translation
    3.LLVM bitcode
    4.Generate BB and CFG
    5.Optimize
    6.Generate deobfuscated code

    View Slide

  8. 8
    Implementation
    ■Modify QEMU Dynamic Translator

    Tiny code generator(tcg)

    BB

    Easy to mapping register of LLVM IR

    Generate CFG from LLVMContext class
    ■Use LLVM optimizer

    Insert dead code

    -dse, -simplifycfg

    Substitute with equivalent instructions

    -constprop, -instcombie

    Reorder instructions

    -instcombie

    View Slide

  9. 9
    Problem
    ■Methods written in Opticode can be deobfuscated

    Without opaque predicate
    However,
    ■QEMU Dynamic translator has problems

    Dependence on context

    Impossible to interpret Win32API

    Overhead
    ■Optimice is more sophisticated than my work

    Deobfuscation plugin for IDA

    Use CFG and BB generated from IDA

    Overcome the problem of my work
    ■Evaluation method is ambiguous...

    View Slide

  10. 10
    Future work
    ■Continuation of research for TERM

    How can we deobfuscate malware?
    ■Establishment of evaluation method
    ■Leading in semantics

    Abstract lnterpretation

    Predicate logic

    There is little existing reserch...

    View Slide