Reversing In Wonderland: Neural Network Based Malware Detection Techniques

229b1596ce57cd0935a2bacd410d87a0?s=47 adr
September 11, 2020

Reversing In Wonderland: Neural Network Based Malware Detection Techniques

在程式碼大量異動與混淆的變種樣本大量在野外攻擊情況下,靜態特徵技術諸如 YARA 仰賴人工分析並撰寫特徵碼成為安全產業難以對抗各種垂手可得的開源後門的主因。

在未來基於神經網路的分析手段將成為主流。這場議程我們釋出一個基於神經網路向量模型引擎:基於詞嵌入手段將執行程式碼前後文轉向量,其能對程式碼大量異動且高度變種的混淆程式進行特徵建模,這將改變整個安全體系的運作方式!此一引擎使研究員能以少量 APT 樣本進行非監督訓練並精準識別出同一類型的執行程式。

在此議程,我們將從神經網路實作層面談起數學細節分析此類型模型成功的原因。並在實驗中我們針對六萬支 VirusTotal 上在野樣本 進行了極少量樣本的訓練:並能在大量遭程式碼變種的樣本中正確識別出同一家族的惡意程式。並在議程末討論此種分析手法實務上的有諸多攻擊面與攻擊者將如何利 用並攻破此類分析技術繞過安全防護產品。

https://hitcon.org/2020/agenda/b05b106b-b5fd-489a-a73e-183e526b86ba/

229b1596ce57cd0935a2bacd410d87a0?s=128

adr

September 11, 2020
Tweet

Transcript

  1. 2020 414141414141414141 AAAAAAAAAA HITCON aaaddress1@chroot.org Reversing In Wonderland Neural Network

    Based Malware Detection Techniques
  2. • Master degree at CSIE, NTUST • Security Researcher -

    chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • aaaddress1@chroot.org • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • smcheng@mail.ntust.edu.tw #4G #5G #LTE_Attack #IoT
  3. aaaddress1@chroot.org 1. Malware in the Wild 2. Semantics 3. Semantic-Aware:

    PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline
  4. aaaddress1@chroot.org 〉〉〉Malware In the Wild

  5. aaaddress1@chroot.org #behavior

  6. aaaddress1@chroot.org #behavior

  7. aaaddress1@chroot.org #behavior

  8. aaaddress1@chroot.org # rule silent_banker : banker { meta: description =

    "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA
  9. aaaddress1@chroot.org File Headr Opt Header PE Data $a $c +a0

    +1e8 +9f7c malware.exe [detected] $b /?malware
  10. aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware
  11. aaaddress1@chroot.org /?malware File Headr Opt Header PE Data $a $b

    $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear
  12. aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c

    +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware
  13. aaaddress1@chroot.org #免殺

  14. aaaddress1@chroot.org #免殺

  15. aaaddress1@chroot.org #AMSI

  16. aaaddress1@chroot.org • Active Protection System - rule-based, not strong enough

    against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge
  17. aaaddress1@chroot.org 〉〉〉Semantics

  18. “You shall know a word by the company it keeps“

    (Firth, J. R. 1957:11) /?semantics
  19. aaaddress1@chroot.org /?semantics “... I can show you the world. Shining,

    shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”
  20. aaaddress1@chroot.org /?semantics ” I drink beer. and the other people“

  21. aaaddress1@chroot.org /?semantics ” we drink wine. “ ” I drink

    beer. “
  22. aaaddress1@chroot.org /?semantics ” we drink wine. “ ” I drink

    beer. “ ” we guzzle wine. “ ” I guzzle beer. “
  23. aaaddress1@chroot.org /?tokenFreq

  24. aaaddress1@chroot.org /?freq drink guzzle cat dog puppy

  25. aaaddress1@chroot.org /?cos(θ) King Man θ

  26. aaaddress1@chroot.org • Co-Occurrence Matrix - count based, token frequency -

    able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics
  27. aaaddress1@chroot.org 〉〉〉Word2Vec

  28. aaaddress1@chroot.org /?tokenFreq drink behavior

  29. aaaddress1@chroot.org /?tokenFreq 4 dim

  30. aaaddress1@chroot.org #Sim

  31. aaaddress1@chroot.org #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 +

    0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828
  32. aaaddress1@chroot.org #Sim more similar

  33. aaaddress1@chroot.org #Sim sim(King - Man) ≒ sigmoid(King・Man) King Man

  34. aaaddress1@chroot.org #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man)

    [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King
  35. aaaddress1@chroot.org #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]:

    Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King
  36. aaaddress1@chroot.org #PV-DM

  37. aaaddress1@chroot.org #Word2Vec

  38. aaaddress1@chroot.org 〉〉〉Asm2Vec

  39. aaaddress1@chroot.org #Asm2Vec

  40. aaaddress1@chroot.org #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04],

    00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script
  41. aaaddress1@chroot.org #Asm2Vec

  42. aaaddress1@chroot.org #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00

    68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph
  43. aaaddress1@chroot.org #1: block_a → block_c → Exit #2: block_a →

    block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit
  44. aaaddress1@chroot.org mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg

    Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script
  45. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  46. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  47. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  48. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  49. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  50. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  51. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...
  52. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim
  53. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator
  54. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =
  55. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =
  56. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =
  57. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)
  58. aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h

    mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)
  59. aaaddress1@chroot.org • Dataset - malware: Mirai samples from VirusTotal (40000+)

    - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp
  60. aaaddress1@chroot.org • MIPS - Mirai: 96.75% (18467 samples) - Benign:

    96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp
  61. />Demo

  62. aaaddress1@chroot.org 〉〉〉Challenge

  63. aaaddress1@chroot.org /!challenge github.com/aaaddress1/theArk

  64. aaaddress1@chroot.org /!PluginX DLL SIDE-LOADING: A Thorn in the Side of

    the Anti-Virus Industry
  65. aaaddress1@chroot.org int main(void) { try { *(char*)NULL = 1; }

    catch (...) { puts("Hell Kitty"); } } /!challenge
  66. aaaddress1@chroot.org /!challenge github.com/xoreaxeaxeax/movfuscator

  67. aaaddress1@chroot.org • Issue based on Control Flow Walking - Self

    modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge
  68. 41414141414141414141414141 Thanks! aaaddress1@chroot.org Slide Github @aaaddress1 Facebook AAAAAAAAAAAAAA AAAAAAA AAA

    HITCON