Slide 1

Slide 1 text

2020 414141414141414141 AAAAAAAAAA HITCON aaaddress1@chroot.org Reversing In Wonderland Neural Network Based Malware Detection Techniques

Slide 2

Slide 2 text

• Master degree at CSIE, NTUST • Security Researcher - chrO.ot • Speaker - BlackHat, DEFCON, HITCON, CYBERSEC • aaaddress1@chroot.org • 30cm.tw & Hao's Arsenal #Windows #Reversing #Pwn #Exploit • Associate Professor of CSIE, NTUST • Joint Associate Research Fellow of CITI, Academia Sinica • smcheng@mail.ntust.edu.tw #4G #5G #LTE_Attack #IoT

Slide 3

Slide 3 text

aaaddress1@chroot.org 1. Malware in the Wild 2. Semantics 3. Semantic-Aware: PV-DM 4. Asm2Vec & Experiment 5. Challenge /?outline

Slide 4

Slide 4 text

aaaddress1@chroot.org 〉〉〉Malware In the Wild

Slide 5

Slide 5 text

aaaddress1@chroot.org #behavior

Slide 6

Slide 6 text

aaaddress1@chroot.org #behavior

Slide 7

Slide 7 text

aaaddress1@chroot.org #behavior

Slide 8

Slide 8 text

aaaddress1@chroot.org # rule silent_banker : banker { meta: description = "malware in the wild" threat_level = 3 in_the_wild = true strings: $a = {6A 40 68 00 30 00 00 6A 14 8D 91} $b = {8D 4D B0 2B C1 83 C0 27 59 F7 F9} $c = "UVODFRYSIHLNWPEJXQZAKCBGMT" condition: $a or $b or $c } YARA

Slide 9

Slide 9 text

aaaddress1@chroot.org File Headr Opt Header PE Data $a $c +a0 +1e8 +9f7c malware.exe [detected] $b /?malware

Slide 10

Slide 10 text

aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#1.bin #1 \x00\x00.. \x00\x00.. detect /?malware

Slide 11

Slide 11 text

aaaddress1@chroot.org /?malware File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#2.bin #2 \x00\x00.. \x00\x00.. clear

Slide 12

Slide 12 text

aaaddress1@chroot.org File Headr Opt Header PE Data $a $b $c +a0 +1e8 +9f7c malware.exe [detected] File Headr Opt Header PE Data (patched) malware_test#3.bin #3 \x00\x00.. \x00\x00.. detect /?malware

Slide 13

Slide 13 text

aaaddress1@chroot.org #免殺

Slide 14

Slide 14 text

aaaddress1@chroot.org #免殺

Slide 15

Slide 15 text

aaaddress1@chroot.org #AMSI

Slide 16

Slide 16 text

aaaddress1@chroot.org • Active Protection System - rule-based, not strong enough against unkown attacks • Malware Pattern based on Reversing - lack of lexical semantic of assembly → false positive - too slow against variability malware • Known Challenges - compiler optimization - Mirai, Hakai, Yowai, SpeakUp - Anti-AntiVirus Techniques • Word Embedding Techniques (NLP) - use only few samples to predict income binary files - learn lexical semantic from instruction sequences /?challenge

Slide 17

Slide 17 text

aaaddress1@chroot.org 〉〉〉Semantics

Slide 18

Slide 18 text

“You shall know a word by the company it keeps“ (Firth, J. R. 1957:11) /?semantics

Slide 19

Slide 19 text

aaaddress1@chroot.org /?semantics “... I can show you the world. Shining, shimmering, splendid. Tell me, princess, now when did. You last let your heart decide? I can open your eyes, Take you wonder by wonder ...”

Slide 20

Slide 20 text

aaaddress1@chroot.org /?semantics ” I drink beer. and the other people“

Slide 21

Slide 21 text

aaaddress1@chroot.org /?semantics ” we drink wine. “ ” I drink beer. “

Slide 22

Slide 22 text

aaaddress1@chroot.org /?semantics ” we drink wine. “ ” I drink beer. “ ” we guzzle wine. “ ” I guzzle beer. “

Slide 23

Slide 23 text

aaaddress1@chroot.org /?tokenFreq

Slide 24

Slide 24 text

aaaddress1@chroot.org /?freq drink guzzle cat dog puppy

Slide 25

Slide 25 text

aaaddress1@chroot.org /?cos(θ) King Man θ

Slide 26

Slide 26 text

aaaddress1@chroot.org • Co-Occurrence Matrix - count based, token frequency - able to capture lexical semantic - Cosine Similarity • Issues - vocabulary - online training → Paragraph Vector Distributed Memory (PV-DM) #semantics

Slide 27

Slide 27 text

aaaddress1@chroot.org 〉〉〉Word2Vec

Slide 28

Slide 28 text

aaaddress1@chroot.org /?tokenFreq drink behavior

Slide 29

Slide 29 text

aaaddress1@chroot.org /?tokenFreq 4 dim

Slide 30

Slide 30 text

aaaddress1@chroot.org #Sim

Slide 31

Slide 31 text

aaaddress1@chroot.org #Sim similar() = 0.13*0.13 + 0.01*0.01 + 0.99*0.93 + 0.01*0.01 ——————————————————————————————————————————————— sqrt(0.13^2 + 0.01^2 + 0.99^2 + 0.01^2) x sqrt(0.13^2 + 0.01^2 + 0.93^2 + 0.01^2) = 0.9999650034397828

Slide 32

Slide 32 text

aaaddress1@chroot.org #Sim more similar

Slide 33

Slide 33 text

aaaddress1@chroot.org #Sim sim(King - Man) ≒ sigmoid(King・Man) King Man

Slide 34

Slide 34 text

aaaddress1@chroot.org #Sim King Man Δ sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = (1 - sim(King - Man))・King

Slide 35

Slide 35 text

aaaddress1@chroot.org #negative King Man sim(King - Man) ≒ sigmoid(King・Man) [BACKWARD]: Man = Man - Δ(King - Man) * learningRate Δ(King - Man) = sim(King - Man)・King

Slide 36

Slide 36 text

aaaddress1@chroot.org #PV-DM

Slide 37

Slide 37 text

aaaddress1@chroot.org #Word2Vec

Slide 38

Slide 38 text

aaaddress1@chroot.org 〉〉〉Asm2Vec

Slide 39

Slide 39 text

aaaddress1@chroot.org #Asm2Vec

Slide 40

Slide 40 text

aaaddress1@chroot.org #paragraph File Headr Opt Header .AddressOfEntryPoint .text mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... asm script

Slide 41

Slide 41 text

aaaddress1@chroot.org #Asm2Vec

Slide 42

Slide 42 text

aaaddress1@chroot.org #PE File Headr Opt Header .AddressOfEntryPoint .text 6A 00 68 AD DE 00 00 68 EF BE 00 00 6A 00 FF 15 FE CA 00 00 33 C0 C3 Control Flow Graph

Slide 43

Slide 43 text

aaaddress1@chroot.org #1: block_a → block_c → Exit #2: block_a → block_c → block_d → block_b → block_c → Exit #3: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit #4: block_a → block_c → block_d → block_b → block_c → block_d → block_b → block_c → block_d → block_b → block_c → Exit /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit

Slide 44

Slide 44 text

aaaddress1@chroot.org mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax cmp [ebp-0x04], Ah jg Exit push 0x3E8 call Sleep jmp block_b ... /?rndWalk mov [ebp-0x04], 00 jmp block_c cmp [ebp-0x04], Ah jg Exit mov eax, [ebp-0x04] add eax, 1 mov [ebp-0x04], eax block_c: block_b: block_a: jmp block_c push 0x3E8 call Sleep jmp block_b jmp block_b block_d: jg Exit asm script

Slide 45

Slide 45 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 46

Slide 46 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 47

Slide 47 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp mov rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 48

Slide 48 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 49

Slide 49 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 50

Slide 50 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 51

Slide 51 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ...

Slide 52

Slide 52 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h lea eax, [ebx+4] push rbp vocab = { 'sub': [-0.53, 0.01 ... -0.08], 'rsp': [ 0.12, 0.31, ... 0.34], 'lea': [-0.75,-0.42, ... -0.72], 'push': [ 0.23, 0.37, ... -0.23], '[ebx+4]':[-0.02,-0.19, ... 0.11], ... } Tokenize 200 dim

Slide 53

Slide 53 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands lea eax, [ebx+4] push rbp ... operator

Slide 54

Slide 54 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... sub rsp, 138h operands operator Ƭ(sub) || ( Ƭ(rsp)/2 + Ƭ(138h)/2 ) Ƭ(instruction) =

Slide 55

Slide 55 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h ... push rbp operands operator Ƭ(push) || ( Ƭ(rbp) ) Ƭ(instruction) =

Slide 56

Slide 56 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax mov [rbp+04h], 0 mov [rbp+32h], 1505h nop nop (null) operands operator Ƭ(nop) || ( null ) Ƭ(instruction) =

Slide 57

Slide 57 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] predict θfs Avg(x)

Slide 58

Slide 58 text

aaaddress1@chroot.org #Asm2Vec push rbp mov rbp, rsp sub rsp, 138h mov rax, 8h mov [rbp+0ch], rax xor eax, eax ... Ƭ("sub rsp, 138h") Ƭ(rsp) [-0.53, 0.01 ... -0.08] sigmoid(x) Avg(x) Ƭ(rbp) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] Ƭ(8h) Avg(x) Ƭ(rax) Ƭ(mov)|| [-0.53, 0.01 ... -0.08] loss θfs loss1/3 loss1/3 loss1/3 Avg(x)

Slide 59

Slide 59 text

aaaddress1@chroot.org • Dataset - malware: Mirai samples from VirusTotal (40000+) - benign: ELF from Linux-based IoT firmware (3600+) - stripped binary • Training - random choose only 25 Mirai samples to train - each token represented by 200-dim vector (random) - negative sampling: 25 tokens - decreasing learning rate: 0.025 → 0.0025 • Cross validation: 10 times • Malicious: Similarity(binary, model) >= 95% $./exp

Slide 60

Slide 60 text

aaaddress1@chroot.org • MIPS - Mirai: 96.75% (18467 samples) - Benign: 96.41% (348 samples) • x86 - Mirai: 96.75% (2564 samples) - Benign: 99.93% (1567 samples) • ARM - Mirai: 98.53% (23827 samples) - Benign: 93.87% (1699 samples) $./exp

Slide 61

Slide 61 text

/>Demo

Slide 62

Slide 62 text

aaaddress1@chroot.org 〉〉〉Challenge

Slide 63

Slide 63 text

aaaddress1@chroot.org /!challenge github.com/aaaddress1/theArk

Slide 64

Slide 64 text

aaaddress1@chroot.org /!PluginX DLL SIDE-LOADING: A Thorn in the Side of the Anti-Virus Industry

Slide 65

Slide 65 text

aaaddress1@chroot.org int main(void) { try { *(char*)NULL = 1; } catch (...) { puts("Hell Kitty"); } } /!challenge

Slide 66

Slide 66 text

aaaddress1@chroot.org /!challenge github.com/xoreaxeaxeax/movfuscator

Slide 67

Slide 67 text

aaaddress1@chroot.org • Issue based on Control Flow Walking - Self modifying code 1. Software Packer e.g. VMProtect, Themida 2. Shellcode Encoder - Control Flow Rerouting 1. Error handling e.g. SEH 2. MultiThread - Exported malicous function - Virtual Method Table • Vector Obfuscation - 95% benignware / 5% injected shellcode - Use common instructions as gadgets to build a obfuscation chain e.g. movfuscator /!challenge

Slide 68

Slide 68 text

41414141414141414141414141 Thanks! aaaddress1@chroot.org Slide Github @aaaddress1 Facebook AAAAAAAAAAAAAA AAAAAAA AAA HITCON