Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reversing In Wonderland: Neural Network Based M...

adr
September 11, 2020

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

在程式碼大量異動與混淆的變種樣本大量在野外攻擊情況下,靜態特徵技術諸如 YARA 仰賴人工分析並撰寫特徵碼成為安全產業難以對抗各種垂手可得的開源後門的主因。

在未來基於神經網路的分析手段將成為主流。這場議程我們釋出一個基於神經網路向量模型引擎:基於詞嵌入手段將執行程式碼前後文轉向量,其能對程式碼大量異動且高度變種的混淆程式進行特徵建模,這將改變整個安全體系的運作方式!此一引擎使研究員能以少量 APT 樣本進行非監督訓練並精準識別出同一類型的執行程式。

在此議程,我們將從神經網路實作層面談起數學細節分析此類型模型成功的原因。並在實驗中我們針對六萬支 VirusTotal 上在野樣本 進行了極少量樣本的訓練:並能在大量遭程式碼變種的樣本中正確識別出同一家族的惡意程式。並在議程末討論此種分析手法實務上的有諸多攻擊面與攻擊者將如何利 用並攻破此類分析技術繞過安全防護產品。

https://hitcon.org/2020/agenda/b05b106b-b5fd-489a-a73e-183e526b86ba/

adr

September 11, 2020
Tweet

More Decks by adr

Other Decks in Technology

Transcript

  1. • Master degree at CSIE, NTUST • Security Researcher -

    chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • aaaddress1@chroot.org • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • smcheng@mail.ntust.edu.tw #4G #5G #LTE_Attack #IoT
  2. aaaddress1@chroot.org 1. Malware in the Wild 2. Semantics 3. Semantic-Aware:

    PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline
  3. aaaddress1@chroot.org # rule silent_banker : banker { meta: description =

    "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA
  4. aaaddress1@chroot.org File Headr Opt Header PE Data $a $c +a0

    +1e8 +9f7c malware.exe [detected] $b /?malware
  5. aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware
  6. aaaddress1@chroot.org /?malware File Headr Opt Header PE Data $a $b

    $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear
  7. aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware
  8. aaaddress1@chroot.org • Active Protection System - rule-based, not strong enough

    against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge
  9. “You shall know a word by the company it keeps“

    (Firth, J. R. 1957:11) /?semantics
  10. aaaddress1@chroot.org /?semantics “... I can show you the world. Shining,

    shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”
  11. aaaddress1@chroot.org /?semantics ” we drink wine. “ ” I drink

    beer. “ ” we guzzle wine. “ ” I guzzle beer. “
  12. aaaddress1@chroot.org • Co-Occurrence Matrix - count based, token frequency -

    able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics
  13. aaaddress1@chroot.org #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 +

    0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828
  14. aaaddress1@chroot.org #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man)

    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King
  15. aaaddress1@chroot.org #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]:

    Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King
  16. aaaddress1@chroot.org #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04],

    00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script
  17. aaaddress1@chroot.org #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00

    68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph
  18. aaaddress1@chroot.org #1: block_a → block_c → Exit #2: block_a →

    block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit
  19. aaaddress1@chroot.org mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg

    Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script
  20. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  21. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  22. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  23. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  24. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  25. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  26. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  27. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim
  28. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator
  29. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =
  30. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =
  31. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =
  32. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)
  33. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)
  34. aaaddress1@chroot.org • Dataset - malware: Mirai samples from VirusTotal (40000+)

    - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp
  35. aaaddress1@chroot.org • MIPS - Mirai: 96.75% (18467 samples) - Benign:

    96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp
  36. aaaddress1@chroot.org int main(void) { try { *(char*)NULL = 1; }

    catch (...) { puts("Hell Kitty"); } } /!challenge
  37. aaaddress1@chroot.org • Issue based on Control Flow Walking - Self

    modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge