Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

adr
September 11, 2020

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

在程式碼大量異動與混淆的變種樣本大量在野外攻擊情況下,靜態特徵技術諸如 YARA 仰賴人工分析並撰寫特徵碼成為安全產業難以對抗各種垂手可得的開源後門的主因。

在未來基於神經網路的分析手段將成為主流。這場議程我們釋出一個基於神經網路向量模型引擎:基於詞嵌入手段將執行程式碼前後文轉向量,其能對程式碼大量異動且高度變種的混淆程式進行特徵建模,這將改變整個安全體系的運作方式!此一引擎使研究員能以少量 APT 樣本進行非監督訓練並精準識別出同一類型的執行程式。

在此議程,我們將從神經網路實作層面談起數學細節分析此類型模型成功的原因。並在實驗中我們針對六萬支 VirusTotal 上在野樣本 進行了極少量樣本的訓練:並能在大量遭程式碼變種的樣本中正確識別出同一家族的惡意程式。並在議程末討論此種分析手法實務上的有諸多攻擊面與攻擊者將如何利 用並攻破此類分析技術繞過安全防護產品。

https://hitcon.org/2020/agenda/b05b106b-b5fd-489a-a73e-183e526b86ba/

adr

September 11, 2020
Tweet

More Decks by adr

Other Decks in Technology

Transcript

  1. 2020
    414141414141414141
    AAAAAAAAAA
    HITCON
    [email protected]
    Reversing In Wonderland
    Neural Network Based Malware Detection Techniques

    View full-size slide

  2. • Master degree at CSIE, NTUST
    • Security Researcher - chrO.ot
    • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC
    [email protected]
    • 30cm.tw & Hao's Arsenal
    #Windows #Reversing #Pwn #Exploit
    • Associate Professor of CSIE, NTUST
    • Joint Associate Research Fellow of
    CITI, Academia Sinica
    [email protected]
    #4G #5G #LTE_Attack #IoT

    View full-size slide

  3. [email protected]
    1. Malware in the Wild
    2. Semantics
    3. Semantic-Aware: PV-DM
    4. Asm2Vec & Experiment
    5. Challenge
    /?outline

    View full-size slide

  4. [email protected]
    〉〉〉Malware In the Wild

    View full-size slide

  5. [email protected]
    #
    rule silent_banker : banker {
    meta:
    description = "malware in the wild"
    threat_level = 3
    in_the_wild = true
    strings:
    $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
    $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9}
    $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
    $a or $b or $c
    }
    YARA

    View full-size slide

  6. [email protected]
    File Headr Opt Header
    PE Data
    $a $c
    +a0 +1e8 +9f7c
    malware.exe [detected]
    $b
    /?malware

    View full-size slide

  7. [email protected]
    File Headr Opt Header
    PE Data
    $a $b $c
    +a0 +1e8 +9f7c
    malware.exe [detected]
    File Headr Opt Header
    PE Data (patched)
    malware_test#1.bin
    #1 \x00\x00.. \x00\x00..
    detect
    /?malware

    View full-size slide

  8. [email protected]
    /?malware
    File Headr Opt Header
    PE Data
    $a $b $c
    +a0 +1e8 +9f7c
    malware.exe [detected]
    File Headr Opt Header
    PE Data (patched)
    malware_test#2.bin
    #2
    \x00\x00.. \x00\x00..
    clear

    View full-size slide

  9. [email protected]
    File Headr Opt Header
    PE Data
    $a $b $c
    +a0 +1e8 +9f7c
    malware.exe [detected]
    File Headr Opt Header
    PE Data (patched)
    malware_test#3.bin
    #3
    \x00\x00..
    \x00\x00..
    detect
    /?malware

    View full-size slide

  10. [email protected]
    • Active Protection System
    - rule-based, not strong enough against unkown attacks
    • Malware Pattern based on Reversing
    - lack of lexical semantic of assembly → false positive
    - too slow against variability malware
    • Known Challenges
    - compiler optimization
    - Mirai, Hakai, Yowai, SpeakUp
    - Anti-AntiVirus Techniques
    • Word Embedding Techniques (NLP)
    - use only few samples to predict income binary files
    - learn lexical semantic from instruction sequences
    /?challenge

    View full-size slide

  11. “You shall know a word by the company it keeps“
    (Firth, J. R. 1957:11)
    /?semantics

    View full-size slide

  12. [email protected]
    /?semantics
    “... I can show you the world. Shining,
    shimmering, splendid. Tell me, princess,
    now when did. You last let your heart
    decide? I can open your eyes, Take you
    wonder by wonder ...”

    View full-size slide

  13. [email protected]
    /?semantics
    ” I drink beer. and the other people“

    View full-size slide

  14. [email protected]
    /?semantics
    ” we drink wine. “
    ” I drink beer. “

    View full-size slide

  15. [email protected]
    /?semantics
    ” we drink wine. “
    ” I drink beer. “
    ” we guzzle wine. “
    ” I guzzle beer. “

    View full-size slide

  16. [email protected]
    /?freq
    drink guzzle cat dog puppy

    View full-size slide

  17. [email protected]
    • Co-Occurrence Matrix
    - count based, token frequency
    - able to capture lexical semantic
    - Cosine Similarity
    • Issues
    - vocabulary
    - online training
    → Paragraph Vector Distributed Memory (PV-DM)
    #semantics

    View full-size slide

  18. [email protected]
    /?tokenFreq
    drink
    behavior

    View full-size slide

  19. [email protected]
    #Sim
    similar()
    =
    0.13*0.13 + 0.01*0.01 + 0.99*0.93 + 0.01*0.01
    ———————————————————————————————————————————————
    sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2)
    x
    sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2)
    =
    0.9999650034397828

    View full-size slide

  20. [email protected]
    #Sim
    sim(King - Man) ≒ sigmoid(King・Man)
    King
    Man

    View full-size slide

  21. [email protected]
    #Sim
    King
    Man
    Δ
    sim(King - Man) ≒ sigmoid(King・Man)
    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate
    Δ(King - Man) = (1 - sim(King - Man))・King

    View full-size slide

  22. [email protected]
    #negative
    King
    Man
    sim(King - Man) ≒ sigmoid(King・Man)
    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate
    Δ(King - Man) = sim(King - Man)・King

    View full-size slide

  23. [email protected]
    #paragraph
    File Headr Opt Header
    .AddressOfEntryPoint
    .text
    mov [ebp-0x04], 00
    jmp block_c
    cmp [ebp-0x04], Ah
    jg Exit
    push 0x3E8
    call Sleep
    jmp block_b
    mov eax, [ebp-0x04]
    add eax, 1
    mov [ebp-0x04], eax
    cmp [ebp-0x04], Ah
    jg Exit
    push 0x3E8
    call Sleep
    jmp block_b
    ...
    asm script

    View full-size slide

  24. [email protected]
    #PE
    File Headr Opt Header
    .AddressOfEntryPoint
    .text
    6A 00
    68 AD DE 00 00
    68 EF BE 00 00
    6A 00
    FF 15 FE CA 00 00
    33 C0
    C3
    Control Flow Graph

    View full-size slide

  25. [email protected]
    #1: block_a → block_c → Exit
    #2: block_a → block_c → block_d →
    block_b → block_c → Exit
    #3: block_a → block_c → block_d →
    block_b → block_c → block_d →
    block_b → block_c → Exit
    #4: block_a → block_c → block_d →
    block_b → block_c → block_d →
    block_b → block_c → block_d →
    block_b → block_c → Exit
    /?rndWalk
    mov [ebp-0x04], 00
    jmp block_c
    cmp [ebp-0x04], Ah
    jg Exit
    mov eax, [ebp-0x04]
    add eax, 1
    mov [ebp-0x04], eax
    block_c:
    block_b:
    block_a:
    jmp block_c
    push 0x3E8
    call Sleep
    jmp block_b
    jmp block_b
    block_d:
    jg Exit

    View full-size slide

  26. [email protected]
    mov [ebp-0x04], 00
    jmp block_c
    cmp [ebp-0x04], Ah
    jg Exit
    push 0x3E8
    call Sleep
    jmp block_b
    mov eax, [ebp-0x04]
    add eax, 1
    mov [ebp-0x04], eax
    cmp [ebp-0x04], Ah
    jg Exit
    push 0x3E8
    call Sleep
    jmp block_b
    ...
    /?rndWalk
    mov [ebp-0x04], 00
    jmp block_c
    cmp [ebp-0x04], Ah
    jg Exit
    mov eax, [ebp-0x04]
    add eax, 1
    mov [ebp-0x04], eax
    block_c:
    block_b:
    block_a:
    jmp block_c
    push 0x3E8
    call Sleep
    jmp block_b
    jmp block_b
    block_d:
    jg Exit
    asm
    script

    View full-size slide

  27. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  28. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  29. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    mov rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  30. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  31. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  32. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  33. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...

    View full-size slide

  34. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...
    sub rsp, 138h
    lea eax, [ebx+4]
    push rbp
    vocab = {
    'sub': [-0.53, 0.01 ... -0.08],
    'rsp': [ 0.12, 0.31, ... 0.34],
    'lea': [-0.75,-0.42, ... -0.72],
    'push': [ 0.23, 0.37, ... -0.23],
    '[ebx+4]':[-0.02,-0.19, ... 0.11],
    ...
    }
    Tokenize
    200 dim

    View full-size slide

  35. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...
    sub rsp, 138h
    operands
    lea eax, [ebx+4]
    push rbp
    ...
    operator

    View full-size slide

  36. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...
    sub rsp, 138h
    operands
    operator
    Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 )
    Ƭ(instruction) =

    View full-size slide

  37. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    ...
    push rbp
    operands
    operator
    Ƭ(push) || ( Ƭ(rbp) )
    Ƭ(instruction) =

    View full-size slide

  38. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    mov [rbp+04h], 0
    mov [rbp+32h], 1505h
    nop
    nop (null)
    operands
    operator
    Ƭ(nop) || ( null )
    Ƭ(instruction) =

    View full-size slide

  39. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    ...
    Ƭ("sub rsp, 138h")
    Ƭ(rsp)
    [-0.53, 0.01 ... -0.08]
    sigmoid(x)
    Avg(x)
    Ƭ(rbp)
    Ƭ(mov)||
    [-0.53, 0.01 ... -0.08]
    Ƭ(8h)
    Avg(x)
    Ƭ(rax)
    Ƭ(mov)||
    [-0.53, 0.01 ... -0.08]
    predict
    θfs
    Avg(x)

    View full-size slide

  40. [email protected]
    #Asm2Vec
    push rbp
    mov rbp, rsp
    sub rsp, 138h
    mov rax, 8h
    mov [rbp+0ch], rax
    xor eax, eax
    ...
    Ƭ("sub rsp, 138h")
    Ƭ(rsp)
    [-0.53, 0.01 ... -0.08]
    sigmoid(x)
    Avg(x)
    Ƭ(rbp)
    Ƭ(mov)||
    [-0.53, 0.01 ... -0.08]
    Ƭ(8h)
    Avg(x)
    Ƭ(rax)
    Ƭ(mov)||
    [-0.53, 0.01 ... -0.08]
    loss
    θfs
    loss1/3
    loss1/3
    loss1/3
    Avg(x)

    View full-size slide

  41. [email protected]
    • Dataset
    - malware: Mirai samples from VirusTotal (40000+)
    - benign: ELF from Linux-based IoT firmware (3600+)
    - stripped binary
    • Training
    - random choose only 25 Mirai samples to train
    - each token represented by 200-dim vector (random)
    - negative sampling: 25 tokens
    - decreasing learning rate: 0.025 → 0.0025
    • Cross validation: 10 times
    • Malicious: Similarity(binary, model) >= 95%
    $./exp

    View full-size slide

  42. [email protected]
    • MIPS
    - Mirai: 96.75% (18467 samples)
    - Benign: 96.41% (348 samples)
    • x86
    - Mirai: 96.75% (2564 samples)
    - Benign: 99.93% (1567 samples)
    • ARM
    - Mirai: 98.53% (23827 samples)
    - Benign: 93.87% (1699 samples)
    $./exp

    View full-size slide

  43. [email protected]
    /!challenge
    github.com/aaaddress1/theArk

    View full-size slide

  44. [email protected]
    /!PluginX
    DLL SIDE-LOADING: A Thorn in the Side of the Anti-Virus Industry

    View full-size slide

  45. [email protected]
    int main(void) {
    try {
    *(char*)NULL = 1;
    } catch (...) {
    puts("Hell Kitty");
    }
    }
    /!challenge

    View full-size slide

  46. [email protected]
    /!challenge
    github.com/xoreaxeaxeax/movfuscator

    View full-size slide

  47. [email protected]
    • Issue based on Control Flow Walking
    - Self modifying code
    1. Software Packer e.g. VMProtect, Themida
    2. Shellcode Encoder
    - Control Flow Rerouting
    1. Error handling e.g. SEH
    2. MultiThread
    - Exported malicous function
    - Virtual Method Table
    • Vector Obfuscation
    - 95% benignware / 5% injected shellcode
    - Use common instructions as gadgets
    to build a obfuscation chain e.g. movfuscator
    /!challenge

    View full-size slide

  48. 41414141414141414141414141
    Thanks!
    [email protected]
    Slide
    Github @aaaddress1
    Facebook
    AAAAAAAAAAAAAA
    AAAAAAA
    AAA
    HITCON

    View full-size slide