Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Meta learning for fun and profit

Meta learning for fun and profit

LINE DevDay 2020

November 25, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1. About me > Manager for Security R&D team in LINE

    > Stanford MS in CS(Network and system security) > Seoul Nat’l Univ BS in CSE w/ Economics, Psychology > I love interdisciplinary stuff > 10+ years pen-test in public and private sectors > CISSP
  2. Finding bug › Code auditing is a highly sophisticated task.

    We need a team of ethical hackers. › Scalability issue › Can we make an autonomous system for this? › One of the best strategy to protect IT systems from malicious hackers.
  3. Deep learning? › Natural Language Processing(NLP) has been progressed a

    lot w/ deep learning! › Programming languages and natural languages share a lot of properties › First of all, the system must be able to understand codes. › We need an intelligent system
  4. Small dataset challenge › It is almost impossible to get

    legitimate security bug samples that much › Most common pain point in the security area › Deep learning requires humongous amount of data for training › More parameters - more data to train them › GPT-3 used 499B tokens
  5. Transformer and BERT .BU.VM .BU.VM 4PGU.BY 4DBMF Q K V

    .VMUJ)FBE "UUFOUJPO "EE/PSN 'FFE 'PSXBSE "EE/PSN *OQVU &NCFEEJOH Input Positional Encoding &ODPEFS &ODPEFS &ODPEFS &ODPEFS &ODPEFS &NCFEEJOHT
  6. Meta-Learning › Meta learning › Learning a learner › Given

    experience on previous tasks, learn a new task quickly. Meta training Meta testing … … … …
  7. Model-Agnostic Meta-Learning › REPTILE, Nichol et al., 2018 › First-order

    approximation › θ ← θ + α 1 n n ∑ i=1 (Uk τi (θ) − θ) › MAML, Finn et al. › Optimization based meta learning › min θ ∑ task i L(θ − α∇ θ L(θ, Dtr i ), Dts i )
  8. Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

    e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT
  9. Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

    e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT
  10. Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

    e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT
  11. Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer

    e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT
  12. Prediction process Embedding layer Transformer layer Transformer layer Transformer layer

    LSTM layer … ▁< script > ▁docu ment . onload = alert ( … . . . Softmax layer
  13. Training strategy BPE Tokenizer e n c o d e

    r …… e n c o d e r e n c o d e r Phase 1 (Pre-training) Phase 2 (Meta-training) Phase 3 (Fine-tuning) BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln Training target Training target Training target English ALBERT Phase1 ALBERT
  14. Training strategy BPE Tokenizer e n c o d e

    r …… e n c o d e r e n c o d e r Phase 1 (Pre-training) Phase 2 (Meta-training) Phase 3 (Fine-tuning) BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln Training target Training target Training target English ALBERT Phase1 ALBERT
  15. Training strategy BPE Tokenizer e n c o d e

    r …… e n c o d e r e n c o d e r Phase 1 (Pre-training) Phase 2 (Meta-training) Phase 3 (Fine-tuning) BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln Training target Training target Training target English ALBERT Phase1 ALBERT
  16. Experiment Target › DOM-based XSS › XSS happens in DOM

    instead of HTML › HTML code is intact -> runtime investigation is necessary › Source and sink <script> document.write("You are visiting: " + document.baseURI); </script> http://www.example.com/vuln.html#<script>alert('xss')</script>
  17. Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

    Question Answering Dataset(SQuAD 2.0) › Generated mini-batch tasks(24 samples for each task) › Fine-tuning data (XSS bug samples) › Patch history from public and private GIT repos › 29 samples of the bug(23 for training, 6 for validation) › Pre-training data › HTML corpus from web(367M) DOM-base XSS bug finding
  18. Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

    Question Answering Dataset(SQuAD 2.0) › Generated mini-batch tasks(24 samples for each task) › Fine-tuning data (XSS bug samples) › Patch history from public and private GIT repos › 29 samples of the bug(23 for training, 6 for validation) › Pre-training data › HTML corpus from web(367M) DOM-base XSS bug finding
  19. Experiment datasets › Meta-learning data (Foreign domain) › The Stanford

    Question Answering Dataset(SQuAD 2.0) › Generated mini-batch tasks(24 samples for each task) › Fine-tuning data (XSS bug samples) › Patch history from public and private GIT repos › 29 samples of the bug(23 for training, 6 for validation) › Pre-training data › HTML corpus from web(367M) DOM-base XSS bug finding
  20. Experiment › Experiment setup › Baseline › Random init for

    RNN, FCNN › Meta-learned › Meta-trained parameters Meta-learning curve Fine-tuning curve comparison EM/F1 score
  21. Is it promising? › Parameter size matters › In GPT-3’s

    few shot learning experiments, it achieved 32.1(125M), 55.9(2.7B), 69.8(175B) for SQuAD 2.0 › Our model has 18M parameters. › F1 score of human performance in SQuAD 2.0 is 89.452 › Even though the task is different, ours got 40.1 › The point of the experiment › Our ingredients actually led to better performance.
  22. Where are we? Number of parameters(Billion) 0 45 90 135

    180 175 17 1.5 0.34 0.11 0.094 We are here. 0.018B GPT-3 T-NLG GPT-2 BERT-Large GPT ELMo 2018.4 2018.7 2018.10 2019.2 2020.1 2020.6
  23. Conclusion › Meta-learning algorithm can be helpful in case of

    small dataset problems › This is a huge point in security area › Foreign domain can be used for meta-training › Structural similarity required › Transformer model is useful but it requires lots of data
  24. Future works › More long term dependency › Handling nested

    structure › Problem extension › Polyglot, different kind of bugs › Ensemble model › Better performance w/o increasing the number of the parameters › Training is so expensive › Leveraging programming language’s grammar and structure
  25. Thank you Paper: Cross-domain meta-learning for bug finding in the

    source codes with a small dataset Presented at EICC 2020, France