マルウェアを機械学習する前に

@ntddk Kaggle - Malware Classification Challenge 2016.02.13 1

• http://ntddk.github.io/ • 2

Kaggle 5 https://www.kaggle.com/

6 • • • ※ David H. Wolpert, The Supervised
Learning No-Free-Lunch Theorems, In Proc. 6th Online World Conference on Soft Computing in Industrial Applications, pp.25-42, 2001.

8 There ain't no such thing as a free lunch
http://www.amazon.co.jp/dp/4150117489 http://www.amazon.co.jp/dp/B00GJMUKMG/ http://www.amazon.co.jp/dp/4150312133/

9 There ain't no such thing as a free lunch
http://www.amazon.co.jp/dp/4150117489 http://www.amazon.co.jp/dp/B00GJMUKMG/ http://www.amazon.co.jp/dp/4150312133/

10 http://blog.kaggle.com/

11 x η g a b c x …

12 x η g a b c x …

13 • • A B Satoshi Watanabe, Knowing and Guessing
― Quantitative Study of Inference and Information John Wiley & Sons, 1969.

14 • • A B Satoshi Watanabe, Knowing and Guessing
― Quantitative Study of Inference and Information John Wiley & Sons, 1969.

15 • • • •

16 https://www.av-test.org/en/statistics/malware/

17 http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf

18 http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf http://www.mcafee.com/jp/resources/reports/rp-threats-predictions-2016.pdf

19 • KERNEL32!VirtualAllocStub • KERNEL32!VirtualProtectStub • KERNEL32!OpenProcessStub • KERNEL32!OpenThreadStub •
…

20 CSEC: MWS: http://www.iwsec.org/mws/2015/about.html

21 https://www.kaggle.com/c/malware-classification/data 16

22 • https://virusshare.com/ • http://malware-traffic-analysis.net/

23 • • • •

24 • • • • API PE

25 https://github.com/corkami/

26 • • • • • •

27 #include <windows.h> typedef int (WINAPI *LPFNMESSAGEBOXW)(HWND, LPCWSTR, LPCWSTR, UINT);
int main() { HMODULE hmod = LoadLibrary(TEXT("user32.dll")); LPFNMESSAGEBOXW lpfnMessageBoxW = (LPFNMESSAGEBOXW)GetProcAddress(hmod, "MessageBoxW"); lpfnMessageBoxW(NULL, L"Hello, world!", L"Test", MB_OK); FreeLibrary(hmod); return 0; } •

28 { "category": "registry", "status": true, "return": "0x00000000", "timestamp": "2015-05-24
02:46:50,773", "thread_id": "3220", "repeated": 0, "api": "NtOpenKey", "arguments": [ { "name": "DesiredAccess", "value": "33554432" }, { "name": "KeyHandle", "value": "0x00000154" }, { "name": "ObjectAttributes", "value": "¥¥REGISTRY¥¥USER¥¥S-1-5-21-916742657-1382504153-4155998892-1001" } ], "id": 83 },

30 • AdaBoost, Gradient Boosting • Kaggle

DAF 31 Mohammad M. Masud, Latifur Khan, Bhavani Thuraisingham, A
scalable multi-level feature extraction technique to detect malicious executables, Information Systems Frontiers, Vol.10, Issue.1, pp.33-45, 2008. 16 DAF: Derived Assembly Features BFS: Binary N-gram Features

マルウェアを機械学習する前に

マルウェアを機械学習する前に

Yuma Kurogome

More Decks by Yuma Kurogome

Other Decks in Programming

Featured

Transcript

@ntddk Kaggle - Malware Classification Challenge 2016.02.13 1

• http://ntddk.github.io/ • 2

3

4

Kaggle 5 https://www.kaggle.com/

6 • • • ※ David H. Wolpert, The Supervised

7 • • • ※ David H. Wolpert, The Supervised

8 There ain't no such thing as a free lunch

9 There ain't no such thing as a free lunch

10 http://blog.kaggle.com/

11 x η g a b c x …

12 x η g a b c x …

13 • • A B Satoshi Watanabe, Knowing and Guessing

14 • • A B Satoshi Watanabe, Knowing and Guessing

15 • • • •

16 https://www.av-test.org/en/statistics/malware/

17 http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf

18 http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf http://www.mcafee.com/jp/resources/reports/rp-threats-predictions-2016.pdf

19 • KERNEL32!VirtualAllocStub • KERNEL32!VirtualProtectStub • KERNEL32!OpenProcessStub • KERNEL32!OpenThreadStub •

20 CSEC: MWS: http://www.iwsec.org/mws/2015/about.html

21 https://www.kaggle.com/c/malware-classification/data 16

22 • https://virusshare.com/ • http://malware-traffic-analysis.net/

23 • • • •

24 • • • • API PE

25 https://github.com/corkami/

26 • • • • • •

27 #include <windows.h> typedef int (WINAPI *LPFNMESSAGEBOXW)(HWND, LPCWSTR, LPCWSTR, UINT);

28 { "category": "registry", "status": true, "return": "0x00000000", "timestamp": "2015-05-24

29 • • • ※ David H. Wolpert, The Supervised

30 • AdaBoost, Gradient Boosting • Kaggle

DAF 31 Mohammad M. Masud, Latifur Khan, Bhavani Thuraisingham, A