Slide 1

Slide 1 text

2020 414141414141414141 AAAAAAAAAA HITCON [email protected] Reversing In Wonderland Neural Network Based Malware Detection Techniques

Slide 2

Slide 2 text

• Master degree at CSIE, NTUST • Security Researcher - chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • [email protected] • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • [email protected] #4G #5G #LTE_Attack #IoT

Slide 3

Slide 3 text

[email protected] 1. Malware in the Wild 2. Semantics 3. Semantic-Aware: PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline

Slide 4

Slide 4 text

[email protected] 〉〉〉Malware In the Wild

Slide 5

Slide 5 text

[email protected] #behavior

Slide 6

Slide 6 text

[email protected] #behavior

Slide 7

Slide 7 text

[email protected] #behavior

Slide 8

Slide 8 text

[email protected] # rule silent_banker : banker { meta: description = "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA

Slide 9

Slide 9 text

[email protected] File Headr Opt Header PE Data $a $c +a0 +1e8 +9f7c malware.exe [detected] $b /?malware

Slide 10

Slide 10 text

[email protected] File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware

Slide 11

Slide 11 text

[email protected] /?malware File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear

Slide 12

Slide 12 text

[email protected] File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

[email protected] • Active Protection System - rule-based, not strong enough against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge

Slide 17

Slide 17 text

[email protected] 〉〉〉Semantics

Slide 18

Slide 18 text

“You shall know a word by the company it keeps“ (Firth, J. R. 1957:11) /?semantics

Slide 19

Slide 19 text

[email protected] /?semantics “... I can show you the world. Shining, shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”

Slide 20

Slide 20 text

[email protected] /?semantics ” I drink beer. and the other people“

Slide 21

Slide 21 text

[email protected] /?semantics ” we drink wine. “ ” I drink beer. “

Slide 22

Slide 22 text

[email protected] /?semantics ” we drink wine. “ ” I drink beer. “ ” we guzzle wine. “ ” I guzzle beer. “

Slide 23

Slide 23 text

[email protected] /?tokenFreq

Slide 24

Slide 24 text

[email protected] /?freq drink guzzle cat dog puppy

Slide 25

Slide 25 text

[email protected] /?cos(θ) King Man θ

Slide 26

Slide 26 text

[email protected] • Co-Occurrence Matrix - count based, token frequency - able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics

Slide 27

Slide 27 text

[email protected] 〉〉〉Word2Vec

Slide 28

Slide 28 text

[email protected] /?tokenFreq drink behavior

Slide 29

Slide 29 text

[email protected] /?tokenFreq 4 dim

Slide 30

Slide 30 text

Slide 31

Slide 31 text

[email protected] #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 + 0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828

Slide 32

Slide 32 text

[email protected] #Sim more similar

Slide 33

Slide 33 text

[email protected] #Sim sim(King - Man) ≒ sigmoid(King・Man) King Man

Slide 34

Slide 34 text

[email protected] #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King

Slide 35

Slide 35 text

[email protected] #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King

Slide 36

Slide 36 text

Slide 37

Slide 37 text

[email protected] #Word2Vec

Slide 38

Slide 38 text

[email protected] 〉〉〉Asm2Vec

Slide 39

Slide 39 text

Slide 40

Slide 40 text

[email protected] #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script

Slide 41

Slide 41 text

Slide 42

Slide 42 text

[email protected] #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00 68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph

Slide 43

Slide 43 text

[email protected] #1: block_a → block_c → Exit #2: block_a → block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit

Slide 44

Slide 44 text

[email protected] mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script

Slide 45

Slide 45 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 46

Slide 46 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 47

Slide 47 text

[email protected] #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 48

Slide 48 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 49

Slide 49 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 50

Slide 50 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 51

Slide 51 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 52

Slide 52 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim

Slide 53

Slide 53 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator

Slide 54

Slide 54 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =

Slide 55

Slide 55 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =

Slide 56

Slide 56 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =

Slide 57

Slide 57 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)

Slide 58

Slide 58 text

[email protected] #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)

Slide 59

Slide 59 text

[email protected] • Dataset - malware: Mirai samples from VirusTotal (40000+) - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp

Slide 60

Slide 60 text

[email protected] • MIPS - Mirai: 96.75% (18467 samples) - Benign: 96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp

Slide 61

Slide 61 text

/>Demo

Slide 62

Slide 62 text

[email protected] 〉〉〉Challenge

Slide 63

Slide 63 text

[email protected] /!challenge github.com/aaaddress1/theArk

Slide 64

Slide 64 text

[email protected] /!PluginX DLL SIDE-LOADING: A Thorn in the Side of the Anti-Virus Industry

Slide 65

Slide 65 text

[email protected] int main(void) { try { *(char*)NULL = 1; } catch (...) { puts("Hell Kitty"); } } /!challenge

Slide 66

Slide 66 text

[email protected] /!challenge github.com/xoreaxeaxeax/movfuscator

Slide 67

Slide 67 text

[email protected] • Issue based on Control Flow Walking - Self modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge

Slide 68

Slide 68 text

41414141414141414141414141 Thanks! [email protected] Slide Github @aaaddress1 Facebook AAAAAAAAAAAAAA AAAAAAA AAA HITCON