Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

adr
September 11, 2020

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

在程式碼大量異動與混淆的變種樣本大量在野外攻擊情況下,靜態特徵技術諸如 YARA 仰賴人工分析並撰寫特徵碼成為安全產業難以對抗各種垂手可得的開源後門的主因。

在未來基於神經網路的分析手段將成為主流。這場議程我們釋出一個基於神經網路向量模型引擎:基於詞嵌入手段將執行程式碼前後文轉向量,其能對程式碼大量異動且高度變種的混淆程式進行特徵建模,這將改變整個安全體系的運作方式!此一引擎使研究員能以少量 APT 樣本進行非監督訓練並精準識別出同一類型的執行程式。

在此議程,我們將從神經網路實作層面談起數學細節分析此類型模型成功的原因。並在實驗中我們針對六萬支 VirusTotal 上在野樣本 進行了極少量樣本的訓練:並能在大量遭程式碼變種的樣本中正確識別出同一家族的惡意程式。並在議程末討論此種分析手法實務上的有諸多攻擊面與攻擊者將如何利 用並攻破此類分析技術繞過安全防護產品。

https://hitcon.org/2020/agenda/b05b106b-b5fd-489a-a73e-183e526b86ba/

adr

September 11, 2020
Tweet

More Decks by adr

Other Decks in Technology

Transcript

  1. • Master degree at CSIE, NTUST • Security Researcher -

    chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • [email protected] • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • [email protected] #4G #5G #LTE_Attack #IoT
  2. [email protected] 1. Malware in the Wild 2. Semantics 3. Semantic-Aware:

    PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline
  3. [email protected] # rule silent_banker : banker { meta: description =

    "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA
  4. [email protected] File Headr Opt Header PE Data $a $c +a0

    +1e8 +9f7c malware.exe [detected] $b /?malware
  5. [email protected] File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware
  6. [email protected] /?malware File Headr Opt Header PE Data $a $b

    $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear
  7. [email protected] File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware
  8. [email protected] • Active Protection System - rule-based, not strong enough

    against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge
  9. “You shall know a word by the company it keeps“

    (Firth, J. R. 1957:11) /?semantics
  10. [email protected] /?semantics “... I can show you the world. Shining,

    shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”
  11. [email protected] /?semantics ” we drink wine. “ ” I drink

    beer. “ ” we guzzle wine. “ ” I guzzle beer. “
  12. [email protected] • Co-Occurrence Matrix - count based, token frequency -

    able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics
  13. [email protected] #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 +

    0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828
  14. [email protected] #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man)

    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King
  15. [email protected] #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]:

    Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King
  16. [email protected] #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04],

    00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script
  17. [email protected] #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00

    68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph
  18. [email protected] #1: block_a → block_c → Exit #2: block_a →

    block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit
  19. [email protected] mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg

    Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script
  20. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  21. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  22. [email protected] #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  23. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  24. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  25. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  26. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  27. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim
  28. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator
  29. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =
  30. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =
  31. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =
  32. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)
  33. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)
  34. [email protected] • Dataset - malware: Mirai samples from VirusTotal (40000+)

    - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp
  35. [email protected] • MIPS - Mirai: 96.75% (18467 samples) - Benign:

    96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp
  36. [email protected] int main(void) { try { *(char*)NULL = 1; }

    catch (...) { puts("Hell Kitty"); } } /!challenge
  37. [email protected] • Issue based on Control Flow Walking - Self

    modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge