Meta learning for fun and profit

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

About me > Manager for Security R&D team in LINE > Stanford MS in CS(Network and system security) > Seoul Nat’l Univ BS in CSE w/ Economics, Psychology > I love interdisciplinary stuff > 10+ years pen-test in public and private sectors > CISSP

Slide 3

Slide 3 text

Agenda › Small data challenge › Preliminaries › System architecture › Experiment › Conclusion

Slide 4

Slide 4 text

Finding bug › Code auditing is a highly sophisticated task. We need a team of ethical hackers. › Scalability issue › Can we make an autonomous system for this? › One of the best strategy to protect IT systems from malicious hackers.

Slide 5

Slide 5 text

Deep learning? › Natural Language Processing(NLP) has been progressed a lot w/ deep learning! › Programming languages and natural languages share a lot of properties › First of all, the system must be able to understand codes. › We need an intelligent system

Slide 6

Slide 6 text

Small dataset challenge › It is almost impossible to get legitimate security bug samples that much › Most common pain point in the security area › Deep learning requires humongous amount of data for training › More parameters - more data to train them › GPT-3 used 499B tokens

Slide 7

Slide 7 text

Our ingredients Reduce training param Transfer learning Meta training Foreign domain data Few shot learner Meta learning

Slide 8

Slide 8 text

Transfer learning %BUB %PH 8PMG ʜ 3BU %BUB #JLF #JDZDMF ʜ $BS Weight transfer Fine tuning

Slide 9

Slide 9 text

Transformer and BERT .BU.VM .BU.VM 4PGU.BY 4DBMF Q K V .VMUJ)FBE "UUFOUJPO "EE/PSN 'FFE 'PSXBSE "EE/PSN *OQVU &NCFEEJOH Input Positional Encoding &ODPEFS &ODPEFS &ODPEFS &ODPEFS &ODPEFS &NCFEEJOHT

Slide 10

Slide 10 text

Few shot learning 5 way, 1 shot image classification 2 way, 1 shot image classification

Slide 11

Slide 11 text

Meta-Learning › Meta learning › Learning a learner › Given experience on previous tasks, learn a new task quickly. Meta training Meta testing … … … …

Slide 12

Slide 12 text

Model-Agnostic Meta-Learning › REPTILE, Nichol et al., 2018 › First-order approximation › θ ← θ + α 1 n n ∑ i=1 (Uk τi (θ) − θ) › MAML, Finn et al. › Optimization based meta learning › min θ ∑ task i L(θ − α∇ θ L(θ, Dtr i ), Dts i )

Slide 13

Slide 13 text

Architecture Source codes Preprocessing Code slicing JSON query BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax FCNN Softmax Start position End position …… e n c o d e r e n c o d e r FCNN Is vuln? ALBERT

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Prediction process Embedding layer Transformer layer Transformer layer Transformer layer LSTM layer … ▁< script > ▁docu ment . onload = alert ( … . . . Softmax layer

Slide 18

Slide 18 text

Training strategy BPE Tokenizer e n c o d e r …… e n c o d e r e n c o d e r Phase 1 (Pre-training) Phase 2 (Meta-training) Phase 3 (Fine-tuning) BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln BPE Tokenizer e n c o d e r Bi-LSTM FCNN Softmax Start End …… e n c o d e r e n c o d e r FCNN Softmax FCNN Vuln Training target Training target Training target English ALBERT Phase1 ALBERT

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Experiment Target › DOM-based XSS › XSS happens in DOM instead of HTML › HTML code is intact -> runtime investigation is necessary › Source and sink document.write("You are visiting: " + document.baseURI); http://www.example.com/vuln.html#alert('xss')

Slide 22

Slide 22 text

Experiment datasets › Meta-learning data (Foreign domain) › The Stanford Question Answering Dataset(SQuAD 2.0) › Generated mini-batch tasks(24 samples for each task) › Fine-tuning data (XSS bug samples) › Patch history from public and private GIT repos › 29 samples of the bug(23 for training, 6 for validation) › Pre-training data › HTML corpus from web(367M) DOM-base XSS bug finding

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Experiment › Experiment setup › Baseline › Random init for RNN, FCNN › Meta-learned › Meta-trained parameters Meta-learning curve Fine-tuning curve comparison EM/F1 score

Slide 26

Slide 26 text

Is it promising? › Parameter size matters › In GPT-3’s few shot learning experiments, it achieved 32.1(125M), 55.9(2.7B), 69.8(175B) for SQuAD 2.0 › Our model has 18M parameters. › F1 score of human performance in SQuAD 2.0 is 89.452 › Even though the task is different, ours got 40.1 › The point of the experiment › Our ingredients actually led to better performance.

Slide 27

Slide 27 text

Where are we? Number of parameters(Billion) 0 45 90 135 180 175 17 1.5 0.34 0.11 0.094 We are here. 0.018B GPT-3 T-NLG GPT-2 BERT-Large GPT ELMo 2018.4 2018.7 2018.10 2019.2 2020.1 2020.6

Slide 28

Slide 28 text

Conclusion › Meta-learning algorithm can be helpful in case of small dataset problems › This is a huge point in security area › Foreign domain can be used for meta-training › Structural similarity required › Transformer model is useful but it requires lots of data

Slide 29

Slide 29 text

Future works › More long term dependency › Handling nested structure › Problem extension › Polyglot, different kind of bugs › Ensemble model › Better performance w/o increasing the number of the parameters › Training is so expensive › Leveraging programming language’s grammar and structure

Slide 30

Slide 30 text

Thank you Paper: Cross-domain meta-learning for bug finding in the source codes with a small dataset Presented at EICC 2020, France