Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

adr
September 11, 2020

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

在程式碼大量異動與混淆的變種樣本大量在野外攻擊情況下,靜態特徵技術諸如 YARA 仰賴人工分析並撰寫特徵碼成為安全產業難以對抗各種垂手可得的開源後門的主因。

在未來基於神經網路的分析手段將成為主流。這場議程我們釋出一個基於神經網路向量模型引擎:基於詞嵌入手段將執行程式碼前後文轉向量,其能對程式碼大量異動且高度變種的混淆程式進行特徵建模,這將改變整個安全體系的運作方式!此一引擎使研究員能以少量 APT 樣本進行非監督訓練並精準識別出同一類型的執行程式。

在此議程,我們將從神經網路實作層面談起數學細節分析此類型模型成功的原因。並在實驗中我們針對六萬支 VirusTotal 上在野樣本 進行了極少量樣本的訓練:並能在大量遭程式碼變種的樣本中正確識別出同一家族的惡意程式。並在議程末討論此種分析手法實務上的有諸多攻擊面與攻擊者將如何利 用並攻破此類分析技術繞過安全防護產品。

https://hitcon.org/2020/agenda/b05b106b-b5fd-489a-a73e-183e526b86ba/

adr

September 11, 2020
Tweet

More Decks by adr

Other Decks in Technology

Transcript

  1. 2020 414141414141414141 AAAAAAAAAA HITCON [email protected] Reversing In Wonderland Neural Network

    Based Malware Detection Techniques
  2. • Master degree at CSIE, NTUST • Security Researcher -

    chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • [email protected] • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • [email protected] #4G #5G #LTE_Attack #IoT
  3. [email protected] 1. Malware in the Wild 2. Semantics 3. Semantic-Aware:

    PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline
  4. [email protected] 〉〉〉Malware In the Wild

  5. [email protected] #behavior

  6. [email protected] #behavior

  7. [email protected] #behavior

  8. [email protected] # rule silent_banker : banker { meta: description =

    "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA
  9. [email protected] File Headr Opt Header PE Data $a $c +a0

    +1e8 +9f7c malware.exe [detected] $b /?malware
  10. [email protected] File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware
  11. [email protected] /?malware File Headr Opt Header PE Data $a $b

    $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear
  12. [email protected] File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware
  13. [email protected] #免殺

  14. [email protected] #免殺

  15. [email protected] #AMSI

  16. [email protected] • Active Protection System - rule-based, not strong enough

    against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge
  17. [email protected] 〉〉〉Semantics

  18. “You shall know a word by the company it keeps“

    (Firth, J. R. 1957:11) /?semantics
  19. [email protected] /?semantics “... I can show you the world. Shining,

    shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”
  20. [email protected] /?semantics ” I drink beer. and the other people“

  21. [email protected] /?semantics ” we drink wine. “ ” I drink

    beer. “
  22. [email protected] /?semantics ” we drink wine. “ ” I drink

    beer. “ ” we guzzle wine. “ ” I guzzle beer. “
  23. [email protected] /?tokenFreq

  24. [email protected] /?freq drink guzzle cat dog puppy

  25. [email protected] /?cos(θ) King Man θ

  26. [email protected] • Co-Occurrence Matrix - count based, token frequency -

    able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics
  27. [email protected] 〉〉〉Word2Vec

  28. [email protected] /?tokenFreq drink behavior

  29. [email protected] /?tokenFreq 4 dim

  30. [email protected] #Sim

  31. [email protected] #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 +

    0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828
  32. [email protected] #Sim more similar

  33. [email protected] #Sim sim(King - Man) ≒ sigmoid(King・Man) King Man

  34. [email protected] #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man)

    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King
  35. [email protected] #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]:

    Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King
  36. [email protected] #PV-DM

  37. [email protected] #Word2Vec

  38. [email protected] 〉〉〉Asm2Vec

  39. [email protected] #Asm2Vec

  40. [email protected] #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04],

    00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script
  41. [email protected] #Asm2Vec

  42. [email protected] #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00

    68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph
  43. [email protected] #1: block_a → block_c → Exit #2: block_a →

    block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit
  44. [email protected] mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg

    Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script
  45. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  46. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  47. [email protected] #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  48. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  49. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  50. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  51. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  52. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim
  53. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator
  54. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =
  55. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =
  56. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =
  57. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)
  58. [email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)
  59. [email protected] • Dataset - malware: Mirai samples from VirusTotal (40000+)

    - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp
  60. [email protected] • MIPS - Mirai: 96.75% (18467 samples) - Benign:

    96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp
  61. />Demo

  62. [email protected] 〉〉〉Challenge

  63. [email protected] /!challenge github.com/aaaddress1/theArk

  64. [email protected] /!PluginX DLL SIDE-LOADING: A Thorn in the Side of

    the Anti-Virus Industry
  65. [email protected] int main(void) { try { *(char*)NULL = 1; }

    catch (...) { puts("Hell Kitty"); } } /!challenge
  66. [email protected] /!challenge github.com/xoreaxeaxeax/movfuscator

  67. [email protected] • Issue based on Control Flow Walking - Self

    modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge
  68. 41414141414141414141414141 Thanks! [email protected] Slide Github @aaaddress1 Facebook AAAAAAAAAAAAAA AAAAAAA AAA

    HITCON