Upgrade to Pro — share decks privately, control downloads, hide ads and more …

マルウェアを機械学習する前に

Yuma Kurogome
February 13, 2016

 マルウェアを機械学習する前に

Kaggle - Malware Classification Challenge勉強会 connpass.com/event/25007/ 発表資料

Yuma Kurogome

February 13, 2016
Tweet

More Decks by Yuma Kurogome

Other Decks in Programming

Transcript

  1. @ntddk
    Kaggle - Malware Classification Challenge
    2016.02.13
    1

    View full-size slide

  2. • http://ntddk.github.io/

    2

    View full-size slide

  3. Kaggle
    5
    https://www.kaggle.com/

    View full-size slide

  4. 6




    David H. Wolpert,
    The Supervised Learning No-Free-Lunch Theorems,
    In Proc. 6th Online World Conference on Soft Computing in
    Industrial Applications, pp.25-42, 2001.

    View full-size slide

  5. 7




    David H. Wolpert,
    The Supervised Learning No-Free-Lunch Theorems,
    In Proc. 6th Online World Conference on Soft Computing in
    Industrial Applications, pp.25-42, 2001.

    View full-size slide

  6. 8
    There ain't no such
    thing as a free lunch
    http://www.amazon.co.jp/dp/4150117489
    http://www.amazon.co.jp/dp/B00GJMUKMG/
    http://www.amazon.co.jp/dp/4150312133/

    View full-size slide

  7. 9
    There ain't no such
    thing as a free lunch
    http://www.amazon.co.jp/dp/4150117489
    http://www.amazon.co.jp/dp/B00GJMUKMG/
    http://www.amazon.co.jp/dp/4150312133/

    View full-size slide

  8. 10
    http://blog.kaggle.com/

    View full-size slide

  9. 11
    x η g
    a
    b
    c
    x

    View full-size slide

  10. 12
    x η g
    a
    b
    c
    x

    View full-size slide

  11. 13

    • A B
    Satoshi Watanabe,
    Knowing and Guessing ― Quantitative Study of Inference
    and Information
    John Wiley & Sons, 1969.

    View full-size slide

  12. 14

    • A B
    Satoshi Watanabe,
    Knowing and Guessing ― Quantitative Study of Inference
    and Information
    John Wiley & Sons, 1969.

    View full-size slide

  13. 15




    View full-size slide

  14. 16
    https://www.av-test.org/en/statistics/malware/

    View full-size slide

  15. 17
    http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf

    View full-size slide

  16. 18
    http://www.mcafee.com/jp/resources/reports/rp-quarterly-threat-q2-2015.pdf
    http://www.mcafee.com/jp/resources/reports/rp-threats-predictions-2016.pdf

    View full-size slide

  17. 19
    • KERNEL32!VirtualAllocStub
    • KERNEL32!VirtualProtectStub
    • KERNEL32!OpenProcessStub
    • KERNEL32!OpenThreadStub
    • …

    View full-size slide

  18. 20
    CSEC:
    MWS:
    http://www.iwsec.org/mws/2015/about.html

    View full-size slide

  19. 21
    https://www.kaggle.com/c/malware-classification/data
    16

    View full-size slide

  20. 22
    • https://virusshare.com/
    • http://malware-traffic-analysis.net/

    View full-size slide

  21. 23




    View full-size slide

  22. 24



    • API
    PE

    View full-size slide

  23. 25
    https://github.com/corkami/

    View full-size slide

  24. 26






    View full-size slide

  25. 27
    #include
    typedef int (WINAPI *LPFNMESSAGEBOXW)(HWND, LPCWSTR, LPCWSTR, UINT);
    int main()
    {
    HMODULE hmod = LoadLibrary(TEXT("user32.dll"));
    LPFNMESSAGEBOXW lpfnMessageBoxW = (LPFNMESSAGEBOXW)GetProcAddress(hmod, "MessageBoxW");
    lpfnMessageBoxW(NULL, L"Hello, world!", L"Test", MB_OK);
    FreeLibrary(hmod);
    return 0;
    }

    View full-size slide

  26. 28
    {
    "category": "registry",
    "status": true,
    "return": "0x00000000",
    "timestamp": "2015-05-24 02:46:50,773",
    "thread_id": "3220",
    "repeated": 0,
    "api": "NtOpenKey",
    "arguments": [
    {
    "name": "DesiredAccess",
    "value": "33554432"
    },
    {
    "name": "KeyHandle",
    "value": "0x00000154"
    },
    {
    "name": "ObjectAttributes",
    "value": "¥¥REGISTRY¥¥USER¥¥S-1-5-21-916742657-1382504153-4155998892-1001"
    }
    ],
    "id": 83
    },

    View full-size slide

  27. 29




    David H. Wolpert,
    The Supervised Learning No-Free-Lunch Theorems,
    In Proc. 6th Online World Conference on Soft Computing in
    Industrial Applications, pp.25-42, 2001.

    View full-size slide

  28. 30
    • AdaBoost, Gradient Boosting
    • Kaggle

    View full-size slide

  29. DAF
    31
    Mohammad M. Masud, Latifur Khan, Bhavani Thuraisingham,
    A scalable multi-level feature extraction technique to detect
    malicious executables,
    Information Systems Frontiers, Vol.10, Issue.1, pp.33-45,
    2008.
    16
    DAF: Derived Assembly Features
    BFS: Binary N-gram Features

    View full-size slide